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Preface 


As many countries struggle to recover from the recent global financial crisis, one 
thing clear is that we do not want to suffer another crisis like this in the future. 
We must study the past in order to prevent future financial crisis. Financial data 
of the past few years thus become important in empirical study. The primary 
objective of the revision is to update the data used and to reanalyze the examples 
so that one can better understand the properties of asset returns. At the same time, 
we also witness many new developments in financial econometrics and financial 
software packages. In particular, the Rmetrics now has many packages for analyzing 
financial time series. The second goal of the revision is to include R commands and 
demonstrations, making it possible and easier for readers to reproduce the results 
shown in the book. 

Collapses of big financial institutions during the crisis show that extreme events 
occur in clusters; they are not independent. To deal with dependence in extremes, 
I include the extremal index in Chapter 7 and discuss its impact on value at risk. 
I also rewrite Chapter 7 to make it easier to understand and more complete. It 
now contains the expected shortfall, or conditional value at risk, for measuring 
finanical risk. 

Substantial efforts are made to draw a balance between the length and cover- 
age of the book. I do not include credit risk or operational risk in this revision 
for three reasons. First, effective methods for assessing credit risk require further 
study. Second, the data are not widely available. Third, the length of the book is 
approaching my limit. 

A brief summary of the added material in the third edition is: 


1. To update the data used throughout the book. 


2. To provide R commands and demonstrations. In some cases, R programs are 
given. 


3. To reanalyze many examples with updated observations. 
4. To introduce skew distributions for volatility modeling in Chapter 3. 


5. To investigate properties of recent high-frequency trading data and to add 
applications of nonlinear duration models in Chapter 5. 
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6. To provide a unified approach to value at risk (VaR) via loss function, to 
discuss expected shortfall (ES), or equivalently the conditional value at risk 
(CVaR), and to introduce extremal index for dependence data in Chapter 7. 

7. To discuss application of cointegration to pairs trading in Chapter 8. 

8. To study applications of dynamic correlation models in Chapter 10. 


I benefit greatly from constructive comments of many readers of the second 
edition, including students, colleagues, and friends. I am indebted to them all. In 
particular, I like to express my sincere thanks to Spencer Graves for creating the 
FinTS package for R and Tom Doan of ESTIMA and Eugene Gath for careful 
reading of the text. I also thank Kam Hamidieh for suggestions concerning new 
topics for the revision. I also like to thank colleagues at Wiley, especially Jackie 
Palmieri and Stephen Quigley, for their support. As always, the revision would not 
be possible without the constant encouragement and unconditional love of my wife 
and children. They are my motivation and source of energy. Part of my research 
is supported by the Booth School of Business, University of Chicago. 

Finally, the website for the book is: 


http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts3. 
Ruey S. Tsay 


Booth School of Business, University of Chicago 
Chicago, Illinois 


Preface to the Second Edition 


The subject of financial time series analysis has attracted substantial attention in 
recent years, especially with the 2003 Nobel awards to Professors Robert Engle and 
Clive Granger. At the same time, the field of financial econometrics has undergone 
various new developments, especially in high-frequency finance, stochastic volatil- 
ity, and software availability. There is a need to make the material more complete 
and accessible for advanced undergraduate and graduate students, practitioners, and 
researchers. The main goals in preparing this second edition have been to bring the 
book up to date both in new developments and empirical analysis, and to enlarge 
the core material of the book by including consistent covariance estimation under 
heteroscedasticity and serial correlation, alternative approaches to volatility mod- 
eling, financial factor models, state-space models, Kalman filtering, and estimation 
of stochastic diffusion models. 

The book therefore has been extended to 12 chapters and substantially revised 
to include S-Plus commands and illustrations. Many empirical demonstrations and 
exercises are updated so that they include the most recent data. 

The two new chapters are Chapter 9, Principal Component Analysis and Factor 
Models, and Chapter 11, State-Space Models and Kalman Filter. The factor mod- 
els discussed include macroeconomic, fundamental, and statistical factor models. 
They are simple and powerful tools for analyzing high-dimensional financial data 
such as portfolio returns. Empirical examples are used to demonstrate the appli- 
cations. The state-space model and Kalman filter are added to demonstrate their 
applicability in finance and ease in computation. They are used in Chapter 12 to 
estimate stochastic volatility models under the general Markov chain Monte Carlo 
(MCMC) framework. The estimation also uses the technique of forward filtering 
and backward sampling to gain computational efficiency. 

A brief summary of the added material in the second edition is: 


1. To update the data used throughout the book. 
2. To provide S-Plus commands and demonstrations. 
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. To consider unit-root tests and methods for consistent estimation of the 


covariance matrix in the presence of conditional heteroscedasticity and serial 
correlation in Chapter 2. 


. To describe alternative approaches to volatility modeling, including use of 


high-frequency transactions data and daily high and low prices of an asset in 
Chapter 3. 


. To give more applications of nonlinear models and methods in Chapter 4. 
. To introduce additional concepts and applications of value at risk in 


Chapter 7. 


7. To discuss cointegrated vector AR models in Chapter 8. 
. To cover various multivariate volatility models in Chapter 10. 
9. To add an effective MCMC method for estimating stochastic volatility models 


in Chapter 12. 


The revision benefits greatly from constructive comments of colleagues, friends, 


and many readers of the first edition. I am indebted to them all. In particular, I 
thank J. C. Artigas, Spencer Graves, Chung-Ming Kuan, Henry Lin, Daniel Peña, 
Jeff Russell, Michael Steele, George Tiao, Mark Wohar, Eric Zivot, and students 
of my MBA classes on financial time series for their comments and discussions 
and Rosalyn Farkas for editorial assistance. I also thank my wife and children for 
their unconditional support and encouragements. Part of my research in financial 
econometrics is supported by the National Science Foundation, the High-Frequency 
Finance Project of the Institute of Economics, Academia Sinica, and the Graduate 
School of Business, University of Chicago. 


Finally, the website for the book is: 


gsbwww.uchicago.edu/fac/ruey.tsay/teaching/fts2. 


Ruey S. Tsay 


University of Chicago 
Chicago, Illinois 


Preface to the First Edition 


This book grew out of an MBA course in analysis of financial time series that I have 
been teaching at the University of Chicago since 1999. It also covers materials of 
Ph.D. courses in time series analysis that I taught over the years. It is an introductory 
book intended to provide a comprehensive and systematic account of financial 
econometric models and their application to modeling and prediction of financial 
time series data. The goals are to learn basic characteristics of financial data, 
understand the application of financial econometric models, and gain experience in 
analyzing financial time series. 

The book will be useful as a text of time series analysis for MBA students with 
finance concentration or senior undergraduate and graduate students in business, 
economics, mathematics, and statistics who are interested in financial econometrics. 
The book is also a useful reference for researchers and practitioners in business, 
finance, and insurance facing value at risk calculation, volatility modeling, and 
analysis of serially correlated data. 

The distinctive features of this book include the combination of recent devel- 
opments in financial econometrics in the econometric and statistical literature. The 
developments discussed include the timely topics of value at risk (VaR), high- 
frequency data analysis, and Markov chain Monte Carlo (MCMC) methods. In 
particular, the book covers some recent results that are yet to appear in academic 
journals; see Chapter 6 on derivative pricing using jump diffusion with closed-form 
formulas, Chapter 7 on value at risk calculation using extreme value theory based on 
a nonhomogeneous two-dimensional Poisson process, and Chapter 9 on multivariate 
volatility models with time-varying correlations. MCMC methods are introduced 
because they are powerful and widely applicable in financial econometrics. These 
methods will be used extensively in the future. 

Another distinctive feature of this book is the emphasis on real examples and 
data analysis. Real financial data are used throughout the book to demonstrate 
applications of the models and methods discussed. The analysis is carried out by 
using several computer packages; the SCA (the Scientific Computing Associates) 
for building linear time series models, the RATS (regression analysis for time series) 
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for estimating volatility models, and the S-Plus for implementing neural networks 
and obtaining postscript plots. Some commands required to run these packages 
are given in appendixes of appropriate chapters. In particular, complicated RATS 
programs used to estimate multivariate volatility models are shown in Appendix 
A of Chapter 9. Some Fortran programs written by myself and others are used 
to price simple options, estimate extreme value models, calculate VaR, and carry 
out Bayesian analysis. Some data sets and programs are accessible from the World 
Wide Web at http://www.gsb.uchicago.edu/fac/ruey.tsay/teaching/fts. 

The book begins with some basic characteristics of financial time series data in 
Chapter 1. The other chapters are divided into three parts. The first part, consisting 
of Chapters 2 to 7, focuses on analysis and application of univariate financial time 
series. The second part of the book covers Chapters 8 and 9 and is concerned with 
the return series of multiple assets. The final part of the book is Chapter 10, which 
introduces Bayesian inference in finance via MCMC methods. 

A knowledge of basic statistical concepts is needed to fully understand the book. 
Throughout the chapters, I have provided a brief review of the necessary statistical 
concepts when they first appear. Even so, a prerequisite in statistics or business 
statistics that includes probability distributions and linear regression analysis is 
highly recommended. A knowledge of finance will be helpful in understanding the 
applications discussed throughout the book. However, readers with advanced back- 
ground in econometrics and statistics can find interesting and challenging topics in 
many areas of the book. 

An MBA course may consist of Chapters 2 and 3 as a core component, followed 
by some nonlinear methods (e.g., the neural network of Chapter 4 and the applica- 
tions discussed in Chapters 5—7 and 10). Readers who are interested in Bayesian 
inference may start with the first five sections of Chapter 10. 

Research in financial time series evolves rapidly and new results continue to 
appear regularly. Although I have attempted to provide broad coverage, there are 
many subjects that I do not cover or can only mention in passing. 

I sincerely thank my teacher and dear friend, George C. Tiao, for his guid- 
ance, encouragement, and deep conviction regarding statistical applications over the 
years. I am grateful to Steve Quigley, Heather Haselkorn, Leslie Galen, Danielle 
LaCouriere, and Amy Hendrickson for making the publication of this book pos- 
sible, to Richard Smith for sending me the estimation program of extreme value 
theory, to Bonnie K. Ray for helpful comments on several chapters, to Steve Kou 
for sending me his preprint on jump diffusion models, to Robert E. McCulloch for 
many years of collaboration on MCMC methods, to many students in my courses 
on analysis of financial time series for their feedback and inputs, and to Jeffrey 
Russell and Michael Zhang for insightful discussions concerning analysis of high- 
frequency financial data. To all these wonderful people I owe a deep sense of 
gratitude. I am also grateful for the support of the Graduate School of Business, 
University of Chicago and the National Science Foundation. Finally, my heartfelt 
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thanks to my wife, Teresa, for her continuous support, encouragement, and under- 
standing; to Julie, Richard, and Vicki for bringing me joy and inspirations; and to 


my parents for their love and care. 


R. S. T. 


Chicago, Illinois 


CHAPTER 1 


Financial Time Series 
and Their Characteristics 


Financial time series analysis is concerned with the theory and practice of asset 
valuation over time. It is a highly empirical discipline, but like other scientific 
fields theory forms the foundation for making inference. There is, however, a 
key feature that distinguishes financial time series analysis from other time series 
analysis. Both financial theory and its empirical time series contain an element of 
uncertainty. For example, there are various definitions of asset volatility, and for a 
stock return series, the volatility is not directly observable. As a result of the added 
uncertainty, statistical theory and methods play an important role in financial time 
series analysis. 

The objective of this book is to provide some knowledge of financial time 
series, introduce some statistical tools useful for analyzing these series, and gain 
experience in financial applications of various econometric methods. We begin 
with the basic concepts of asset returns and a brief introduction to the processes 
to be discussed throughout the book. Chapter 2 reviews basic concepts of linear 
time series analysis such as stationarity and autocorrelation function, introduces 
simple linear models for handling serial dependence of the series, and discusses 
regression models with time series errors, seasonality, unit-root nonstationarity, and 
long-memory processes. The chapter also provides methods for consistent estima- 
tion of the covariance matrix in the presence of conditional heteroscedasticity and 
serial correlations. Chapter 3 focuses on modeling conditional heteroscedasticity 
(i.e., the conditional variance of an asset return). It discusses various econometric 
models developed recently to describe the evolution of volatility of an asset return 
over time. The chapter also discusses alternative methods to volatility modeling, 
including use of high-frequency transactions data and daily high and low prices of 
an asset. In Chapter 4, we address nonlinearity in financial time series, introduce 
test statistics that can discriminate nonlinear series from linear ones, and discuss 
several nonlinear models. The chapter also introduces nonparametric estimation 
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methods and neural networks and shows various applications of nonlinear models 
in finance. Chapter 5 is concerned with analysis of high-frequency financial data, the 
effects of market microstructure, and some applications of high-frequency finance. 
It shows that nonsynchronous trading and bid—ask bounce can introduce serial cor- 
relations in a stock return. It also studies the dynamic of time duration between 
trades and some econometric models for analyzing transactions data. In Chapter 6, 
we introduce continuous-time diffusion models and Ito’s lemma. Black—Scholes 
option pricing formulas are derived, and a simple jump diffusion model is used 
to capture some characteristics commonly observed in options markets. Chapter 7 
discusses extreme value theory, heavy-tailed distributions, and their application to 
financial risk management. In particular, it discusses various methods for calculat- 
ing value at risk and expected shortfall of a financial position. Chapter 8 focuses 
on multivariate time series analysis and simple multivariate models with empha- 
sis on the lead—lag relationship between time series. The chapter also introduces 
cointegration, some cointegration tests, and threshold cointegration and applies the 
concept of cointegration to investigate arbitrage opportunity in financial markets, 
including pairs trading. Chapter 9 discusses ways to simplify the dynamic struc- 
ture of a multivariate series and methods to reduce the dimension. It introduces 
and demonstrates three types of factor model to analyze returns of multiple assets. 
In Chapter 10, we introduce multivariate volatility models, including those with 
time-varying correlations, and discuss methods that can be used to reparameterize 
a conditional covariance matrix to satisfy the positiveness constraint and reduce the 
complexity in volatility modeling. Chapter 11 introduces state-space models and 
the Kalman filter and discusses the relationship between state-space models and 
other econometric models discussed in the book. It also gives several examples 
of financial applications. Finally, in Chapter 12, we introduce some Markov chain 
Monte Carlo (MCMC) methods developed in the statistical literature and apply 
these methods to various financial research problems, such as the estimation of 
stochastic volatility and Markov switching models. 

The book places great emphasis on application and empirical data analysis. 
Every chapter contains real examples and, in many occasions, empirical character- 
istics of financial time series are used to motivate the development of econometric 
models. Computer programs and commands used in data analysis are provided 
when needed. In some cases, the programs are given in an appendix. Many real 
data sets are also used in the exercises of each chapter. 


1.1 ASSET RETURNS 


Most financial studies involve returns, instead of prices, of assets. Campbell, Lo, 
and MacKinlay (1997) give two main reasons for using returns. First, for average 
investors, return of an asset is a complete and scale-free summary of the investment 
opportunity. Second, return series are easier to handle than price series because 
the former have more attractive statistical properties. There are, however, several 
definitions of an asset return. 


ASSET RETURNS 3 


Let P, be the price of an asset at time index t. We discuss some definitions of 
returns that are used throughout the book. Assume for the moment that the asset 
pays no dividends. 


One-Period Simple Return 
Holding the asset for one period from date t — 1 to date ¢ would result in a simple 
gross return: 


P, 
I+R=7 
t—1 


or P, = P, (1 + R;). (1.1) 


The corresponding one-period simple net return or simple return is 


P, P, — Py 
EE ee i ay 


(1.2) 
Pr-1 P,- 


Multiperiod Simple Return 
Holding the asset for k periods between dates t — k and f gives a k-period simple 
gross return: 


P P Py P, 
LERI =a ye 
P,—k Pr-1 Pr-2 P;—k 
= d +F R) F R1) ee qd F Ri—k+1) 
k-1 
=| [0+ R-;). 
j=0 


Thus, the k-period simple gross return is just the product of the k one-period simple 
gross returns involved. This is called a compound return. The k-period simple net 
return is R;[k] = (P; — P;—-k)/Pi—k. 

In practice, the actual time interval is important in discussing and comparing 
returns (e.g., monthly return or annual return). If the time interval is not given, 
then it is implicitly assumed to be one year. If the asset was held for k years, then 
the annualized (average) return is defined as 
ca 1/k 
Annualized {R,[k]}= | [ [0+ R-p]  -1. 

j=0 


This is a geometric mean of the k one-period simple gross returns involved and 
can be computed by 


k-1 
Annualized {R,[k]} = exp > nd + R;-;)| -1, 
J= 
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where exp(x) denotes the exponential function and In(x) is the natural logarithm 
of the positive number x. Because it is easier to compute arithmetic average than 
geometric mean and the one-period returns tend to be small, one can use a first-order 
Taylor expansion to approximate the annualized return and obtain 


nnualize xX iia P 
t 7 ) t-j 


j= 


Accuracy of the approximation in Eq. (1.3) may not be sufficient in some applica- 
tions, however. 


Continuous Compounding 

Before introducing continuously compounded return, we discuss the effect of com- 
pounding. Assume that the interest rate of a bank deposit is 10% per annum and 
the initial deposit is $1.00. If the bank pays interest once a year, then the net value 
of the deposit becomes $1(1 + 0.1) = $1.1 one year later. If the bank pays inter- 
est semiannually, the 6-month interest rate is 10%/2 = 5% and the net value is 
$1(1 + 0.1/2)? = $1.1025 after the first year. In general, if the bank pays interest 
m times a year, then the interest rate for each payment is 10%/m and the net value 
of the deposit becomes $1(1 + 0.1/m)” one year later. Table 1.1 gives the results 
for some commonly used time intervals on a deposit of $1.00 with interest rate of 
10% per annum. In particular, the net value approaches $1.1052, which is obtained 
by exp(0.1) and referred to as the result of continuous compounding. The effect of 
compounding is clearly seen. 

In general, the net asset value A of continuous compounding is 


A = C exp(r x n), (1.4) 


where r is the interest rate per annum, C is the initial capital, and n is the number 
of years. From Eq. (1.4), we have 


C = A exp(-r x n), (1:5) 


TABLE 1.1 Illustration of Effects of Compounding: Time Interval Is 1 Year and 
Interest Rate Is 10% per Annum 


Type Number of Payments Interest Rate per Period Net Value 
Annual 1 0.1 $1.10000 
Semiannual 2 0.05 $1.10250 
Quarterly 4 0.025 $1.10381 
Monthly 12 0.0083 $1.10471 
Weekly 52 0.1/52 $1.10506 
Daily 365 0.1/365 $1.10516 


Continuously fone) $1.10517 
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which is referred to as the present value of an asset that is worth A dollars n 
years from now, assuming that the continuously compounded interest rate is r per 
annum. 


Continuously Compounded Return 
The natural logarithm of the simple gross return of an asset is called the continu- 
ously compounded return or log return: 

P; 


= In 1 R =In 
rr (d +R) P, 


= Pt — Pr-1, (1.6) 


where p, = In(P;). Continuously compounded returns r; enjoy some advantages 
over the simple net returns R;. First, consider multiperiod returns. We have 


r[k] = In0 + Rk) = Inf. + RAA + R1) + Re) 
= In(1 + R) + In. + R1) +--+ + Ind + Ri-k+1) 


= rt + rei +++ + rtk. 


Thus, the continuously compounded multiperiod return is simply the sum of con- 
tinuously compounded one-period returns involved. Second, statistical properties 
of log returns are more tractable. 


Portfolio Return 

The simple net return of a portfolio consisting of N assets is a weighted average 
of the simple net returns of the assets involved, where the weight on each asset is 
the percentage of the portfolio’s value invested in that asset. Let p be a portfolio 
that places weight w; on asset i. Then the simple return of p at time f is Rp; = 
DA w; Rit, where R;,; is the simple return of asset i. 

The continuously compounded returns of a portfolio, however, do not have the 
above convenient property. If the simple returns Rj; are all small in magnitude, then 
we have rp; ~ pea | Wirit, Where rp, is the continuously compounded return of 
the portfolio at time t. This approximation is often used to study portfolio returns. 


Dividend Payment 

If an asset pays dividends periodically, we must modify the definitions of asset 

returns. Let D; be the dividend payment of an asset between dates t — 1 and ¢ and P, 

be the price of the asset at the end of period t. Thus, dividend is not included in P;. 

Then the simple net return and continuously compounded return at time t become 
_ P,+D;, 


R, = —— -l, r, = In(P, + D,) — In(P;_1). 
Pi-1 


Excess Return 
Excess return of an asset at time f is the difference between the asset’s return and 
the return on some reference asset. The reference asset is often taken to be riskless 
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such as a short-term U.S. Treasury bill return. The simple excess return and log 
excess return of an asset are then defined as 


Zi = Ri — Ror, Zt =r — Or, (1.7) 


where Ro; and ro; are the simple and log returns of the reference asset, respectively. 
In the finance literature, the excess return is thought of as the payoff on an arbitrage 
portfolio that goes long in an asset and short in the reference asset with no net 
initial investment. 


Remark. A long financial position means owning the asset. A short position 
involves selling an asset one does not own. This is accomplished by borrowing the 
asset from an investor who has purchased it. At some subsequent date, the short 
seller is obligated to buy exactly the same number of shares borrowed to pay back 
the lender. Because the repayment requires equal shares rather than equal dollars, 
the short seller benefits from a decline in the price of the asset. If cash dividends are 
paid on the asset while a short position is maintained, these are paid to the buyer 
of the short sale. The short seller must also compensate the lender by matching 
the cash dividends from his own resources. In other words, the short seller is also 
obligated to pay cash dividends on the borrowed asset to the lender. 


Summary of Relationship 
The relationships between simple return R; and continuously compounded (or log) 
return r; are 

ri = ln(1 + R;), R, =e" —1. 


If the returns R, and r, are in percentages, then 


R r. 
r, = 100 in(1 + x) ; R, = 100 (e7/100 T 1) , 


Temporal aggregation of the returns produces 


LARIS AHARIA + Rr) (H R), 
ri[k] = ri + rei +: Ft—k+1- 


If the continuously compounded interest rate is r per annum, then the relationship 
between present and future values of an asset is 


A = C exp(r x n), C = A exp(-r x n). 


Example 1.1. If the monthly log return of an asset is 4.46%, then the corre- 
sponding monthly simple return is 100[exp(4.46/100) — 1] = 4.56%. Also, if the 
monthly log returns of the asset within a quarter are 4.46%, —7.34%, and 10.77%, 
respectively, then the quarterly log return of the asset is (4.46 — 7.34 + 10.77)% = 
7.89%. 
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1.2 DISTRIBUTIONAL PROPERTIES OF RETURNS 


To study asset returns, it is best to begin with their distributional properties. 
The objective here is to understand the behavior of the returns across assets 
and over time. Consider a collection of N assets held for T time periods, say, 


t=1,...,7. For each asset i, let r; be its log return at time t. The log returns 
under study are {r;; i = 1,..., N; t = 1,..., T}. One can also consider the sim- 
ple returns {Rj,;;i = 1,..., N; t = 1,..., T} and the log excess returns {z;;;i = 


l,a., Nit=1,..., T}. 


1.2.1 Review of Statistical Distributions and Their Moments 


We briefly review some basic properties of statistical distributions and the 
moment equations of a random variable. Let R* be the k-dimensional Euclidean 
space. A point in R* is denoted by x € R*. Consider two random vectors 
X = (X,..., Xx)’ and Y = (Y1, ..., Y,)’. Let P(X € A, Y € B) be the proba- 
bility that X is in the subspace A C R* and Y is in the subspace B C R4. For 
most of the cases considered in this book, both random vectors are assumed to be 
continuous. 


Joint Distribution 
The function 
Fy y(x, y;0) = P(X <x,Y < y;6), 


where x € RP, y €e R1, and the inequality < is a component-by-component oper- 
ation, is a joint distribution function of X and Y with parameter 0. Behavior of X 


and Y is characterized by Fy y(x, y; 0). If the joint probability density function 
fx.y Œ, y; 0) of X and Y exists, then 


x py 
Fy y(x, y; 0) = / / fry (W, z; 0) dz dw. 
=00 J—00 i 
In this case, X and Y are continuous random vectors. 


Marginal Distribution 
The marginal distribution of X is given by 


Fx (x; 0) = Fx,y (x, œ, : -+ , 00; 0). 
Thus, the marginal distribution of X is obtained by integrating out Y. A similar 
definition applies to the marginal distribution of Y. 


If k = 1, X is a scalar random variable and the distribution function becomes 


Fx(x) = P(X < x; 0), 
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which is known as the cumulative distribution function (CDF) of X. The CDF of a 
random variable is nondecreasing [i.e., Fy (x1) < Fx (x2) if xı < x2] and satisfies 
Fx (—oo) = 0 and Fy (oo) = 1. For a given probability p, the smallest real number 
Xp such that p < Fy(x,) is called the 100pth quantile of the random variable X. 
More specifically, 

Xp = inf {x| p < Fx(x)}. 


We use the CDF to compute the p value of a test statistic in the book. 


Conditional Distribution 
The conditional distribution of X given Y < y is given by 


P(X <x,Y < y;0) 


Fyiy<y(x3 0) = PO < y0) 


If the probability density functions involved exist, then the conditional density of 
X given Y = y is 


fxy Œ, y; 0) 


ET E LA 
hpa =T cy: @) 


(1.8) 


where the marginal density function f(y; 0) is obtained by 


CoO 
AY: =| xy, y; 0) dx. 
=00 
From Eq. (1.8), the relation among joint, marginal, and conditional distributions is 


Jx yX, y; 0) = fry (x; 0) x HY: 0). (1.9) 


This identity is used extensively in time series analysis (e.g., in maximum- 
likelihood estimation). Finally, X and Y are independent random vectors if and 
only if frjy(x; 0) = fr (x; 0). In this case, fy y(x, y; 0) = fx; 0 f(y; 0). 


Moments of a Random Variable 
The @th moment of a continuous random variable X is defined as 


m= EX) = [x fade, 


where E stands for expectation and f(x) is the probability density function of X. 
The first moment is called the mean or expectation of X. It measures the central 
location of the distribution. We denote the mean of X by ux. The £th central 
moment of X is defined as 


me = BUX -p= f= na! fede 
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provided that the integral exists. The second central moment, denoted by OF; mea- 
sures the variability of X and is called the variance of X. The positive square root, 
Ox, of variance is the standard deviation of X. The first two moments of a random 
variable uniquely determine a normal distribution. For other distributions, higher 
order moments are also of interest. 

The third central moment measures the symmetry of X with respect to its mean, 
whereas the fourth central moment measures the tail behavior of X. In statistics, 
skewness and kurtosis, which are normalized third and fourth central moments 
of X, are often used to summarize the extent of asymmetry and tail thickness. 
Specifically, the skewness and kurtosis of X are defined as 


= 3 _ 4 
S(x) = E >] = K&@=E [=] 


oR o% 

The quantity K(x) — 3 is called the excess kurtosis because K(x) = 3 for a nor- 
mal distribution. Thus, the excess kurtosis of a normal random variable is zero. 
A distribution with positive excess kurtosis is said to have heavy tails, implying 
that the distribution puts more mass on the tails of its support than a normal distri- 
bution does. In practice, this means that a random sample from such a distribution 
tends to contain more extreme values. Such a distribution is said to be leptokur- 
tic. On the other hand, a distribution with negative excess kurtosis has short tails 
(e.g., a uniform distribution over a finite interval). Such a distribution is said to be 
platykurtic. 

In application, skewness and kurtosis can be estimated by their sample counter- 


parts. Let {x1, ..., xr} be a random sample of X with T observations. The sample 
mean is 
pt 
fix = FL (1.10) 


the sample variance is 


T 
ôi = — Dt = fix)’, (1.11) 
the sample skewness is 
: 1 2 ae 
SO = ER Dt -= fix)’, (1.12) 
and the sample kurtosis is 
1 


ines 


F 
X — fx). (1.13) 
t=1 
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Under the normality assumption, s (x) and K(x) — 3 are distributed asymptoti- 
cally as normal with zero mean and variances 6/T and 24/T, respectively; see 
Snedecor and Cochran (1980, p. 78). These asymptotic properties can be used to 
test the normality of asset returns. Given an asset return series {r1,..., rr}, to 
test the skewness of the returns, we consider the null hypothesis Hp : S(r) = 0 
versus the alternative hypothesis Ha : S(r) 4 0. The t-ratio statistic of the sample 
skewness in Eq. (1.12) is 

K (r) 

Vo/T 


The decision rule is as follows. Reject the null hypothesis at the œ significance 
level, if |t| > Zaj2, where Zy/2 is the upper 100(@/2)th quantile of the standard 
normal distribution. Alternatively, one can compute the p value of the test statistic 
t and reject Ho if and only if the p value is less than a. 

Similarly, one can test the excess kurtosis of the return series using the hypothe- 
ses Ho : K(r) — 3 = 0 versus Ha : K(r) —3 £0. The test statistic is 


T= 


_ Kir) -3 
SPAT 


which is asymptotically a standard normal random variable. The decision rule is to 
reject Ho if and only if the p value of the test statistic is less than the significance 
level œ. Jarque and Bera (1987) (JB) combine the two prior tests and use the test 
statistic z 2 
IB eo) [K -3P 

~ 6/T 24/T ’ 


which is asymptotically distributed as a chi-squared random variable with 2 degrees 
of freedom, to test for the normality of r,. One rejects Hp of normality if the p 
value of the JB statistic is less than the significance level. 


Example 1.2. Consider the daily simple returns of the International Business 
Machines (IBM) stock used in Table 1.2. The sample skewness and kurtosis of 
the returns are parts of the descriptive (or summary) statistics that can be obtained 
easily using various statistical software packages. Both R and S-Plus are used in 
the demonstration, where d-ibm3dx7008.txt is the data file name. Note that in 
R the kurtosis denotes excess kurtosis. From the output, the excess kurtosis is high, 
indicating that the daily simple returns of IBM stock have heavy tails. To test the 
symmetry of return distribution, we use the test statistic 


, — 0.0614 _ 0.0614 _ 
~ 6/9845 0.0247 


which gives a p value of about 0.013, indicating that the daily simple returns of 
IBM stock are significantly skewed to the right at the 5% level. 


2.49, 
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TABLE 1.2 Descriptive Statistics for Daily and Monthly Simple and Log Returns of 
Selected Indexes and Stocks“ 


Standard Excess 
Security Start Size Mean Deviation Skewness Kurtosis Minimum Maximum 


Daily Simple Returns (%) 


SP 70/01/02 9845 0.029 1.056 —0.73 22.81 —20.47 11.58 
VW 70/01/02 9845 0.040 1.004 —0.62 18.02" —17.13 11.52 
EW 70/01/02 9845 0.076 0.814 —0.77 17.08 —10.39 10.74 
IBM 70/01/02 9845 0.040 1.693 0.06 9.92 —22.96 13.16 
Intel 72/12/15 9096 0.108 2.891 —0.15 613° -=29:37 26.38 
3M 670/01/02 9845 0.045 1.482 —0.36 13.34 —25.98 11.54 
Microsoft 86/03/14 5752 0.123 2.359 —0.13 9.92- —30.12 19.57 
Citi-Grp 86/10/30 5592 0.067 2.602 1.80 55.25 —26.41 57.82 
Daily Log Returns (%) 
SP 70/01/02 9845 0.023 1.062 =1.17 30.20 —22.90 10.96 
VW 70/01/02 9845 0.035 1.008 —0.94 21.56 —18.80 10.90 
EW 70/01/02 9845 0.072 0.816 —1.00 17.76 —10.97 10.20 
IBM 70/01/02 9845 0.026 1.694 =0:27 12.17 —26.09 12.37 
Intel 72/12/15 9096 0.066 2.905 —0.54 7.81  —35.06 23.41 
3M 70/01/02 9845 0.034 1.488 —0.78 20.57 —30.08 10.92 
Microsoft 86/03/14 5752 0.095 2.369 —0.63 14.23 —35.83 17.87 
Citi-Grp 86/10/30 5592 0.033 2.575 0.22 33.19 —30.66 45.63 
Monthly Simple Returns (%) 
SP 26/01 996 0.58 5.53 0.32 9.47 —29.94 42.22 
VW 26/01 996 0.89 5.43 0.15 7.69 —29.01 38.37 
EW 26/01 996 1.22 7.40 1.52 14.94 —31.28 66.59 
IBM 26/01 996 1.35 TIS 0.44 3.43 —26.19 47.06 
Intel 73/01 432 2.21 12.85 0.32 2.70 —44.87 62.50 
3M 46/02 755 1.24 6.45 0.22 0.98 —27.83 25.80 
Microsoft 86/04 213 2:62 11.08 0.66 1.96 —34.35 51.55 
Citi-Grp 86/11 266 1.17 9.75 —0.47 1.77  —39.27 26.08 
Monthly Log Returns (%) 
SP 26/01 996 0.43 5.54 —0.52 7.93 —35.58 39.22 
VW 26/01 996 0.74 5.43 —0.58 6.85 —34.22 32.47 
EW 26/01 996 0.96 7.14 0.25 6:55 =37.31 51.04 
IBM 26/01 996 1.09 7.03 —0.07 2:62 —30.37 38.57 
Intel 73/01 432 1.39 12.80 —0.55 3.06 —59.54 48.55 
3M 46/02 755 1.03 6.37 —0.08 1.25 —32.61 22.95 
Microsoft 86/04 273 2.01 10.66 0.10 1.59 —42.09 41.58 
Citi-Grp 86/11 266 0.68 10.09 —1.09 3.76 —49.87 23.18 


“Returns are in percentages and the sample period ends on December 31, 2008. The statistics are defined 
in eqs. (1.10)—(1.13), and VW, EW and SP denote value-weighted, equal-weighted, and S&P composite 
index. 
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R Demonstration 
In the following program code > is the prompt character and % denotes explana- 
tion: 


library(fBasics) % Load the package fBasics. 
da=read.table("d-ibm3dx7008.txt",header=T) % Load the data. 
header=T means 1st row of the data file contains 

variable names. The default is header=F, i.e., no names. 


oe Æ 


> dim(da) % Find size of the data: 9845 rows and 5 columns. 
[1] 9845 5 
> da[1,] % See the first row of the data 

Date rtn vwretd ewretd sprtrn % column names 


1 19700102 0.000686 0.012137 0.03345 0.010211 


> ibm=dal[,2] % Obtain IBM simple returns 

> sibm=ibm*100 % Percentage simple returns 

> basicStats (sibm) % Compute the summary statistics 
sibm 

nobs 9845.000000 % Sample size 

NAs 0.000000 % Number of missing values 

Minimum -22.963000 

Maximum 13.163600 


1. Quartile -0.857100 
3. Quartile 0.883300 


25th percentile 
75th percentile 


de Æ Æ 


Mean 0.040161 Sample mean 
Median 0.000000 % Sample median 
Sum 395.387600 % Sum of the percentage simple returns 
SE Mean 0.017058 % Standard error of the sample mean 
CL Mean 0.006724 % Lower bound of 95% conf. 
% interval for mean 
UCL Mean 0.073599 % Upper bound of 95% conf. 
% interval for mean 
Variance 2.864705 % Sample variance 
Stdev 1.692544 % Sample standard error 
Skewness 0.061399 % Sample skewness 
Kurtosis 9.916359 % Sample excess kurtosis. 


% Alternatively, one can use individual commands as follows: 
> mean (sibm) 

1] 0.04016126 

> var(sibm) 

1] 2.864705 

> sqrt(var(sibm)) % Standard deviation 

1] 1.692544 

> skewness (sibm) 

1] 0.06139878 

attr(, "method" ) 
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[1] "moment" 
> kurtosis (sibm) 


[1] 9.91636 
attr(,"method") 
[1] "excess" 


% Simple tests 

sl=skewness (sibm) 

tl=sl1/sqrt(6/9845) % Compute test statistic 
> tl 

[1] 2.487093 

> pv=2*(1l-pnorm(t1)) % Compute p-value. 

> pv 

[1] 0.01287919 


% Turn to log returns in percentages 

> libm=log(ibm+1) *100 

> t.test(libm) % Test mean being zero. 
One Sample t-test 

data: libm 

t = 1.5126, df = 9844, p-value = 0.1304 

alternative hypothesis: true mean is not equal to 0 

95 percent confidence interval: 

-0.007641473 0.059290531 
% The result shows that the hypothesis of zero expected return 
% cannot be rejected at the 5% or 10% level. 


> normalTest (libm,method=’jb’) % Normality test 
Title: 
Jarque - Bera Normality Test 


Test Results: 
STATISTIC: 
X-squared: 60921.9343 
P VALUE: 
Asymptotic p Value: < 2.2e-16 
% The result shows the normality for log-return is rejected. 


S-Plus Demonstration 
In the following program code > is the prompt character and % marks explanation: 


> module(finmetrics) % Load the Finmetrics module. 
> da=read.table("d-ibm3dx7008.txt",header=T) % Load data. 


> dim(da) % Obtain the size of the data set. 
[1] 9845 5 
> da[1,] % See the first row of the data 

Date rtn vwretd ewretd sprtrn 


1 19700102 0.000686 0.012137 0.03345 0.010211 
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> sibm=da[,2]*100 % Obtain percentage simple returns of 
% IBM stock. 
> summaryStats(sibm) % Obtain summary statistics 
Sample Quantiles: 
min 10 median 30 max 
-22.96 -0.8571 0 0.8833 13.16 
Sample Moments: 
mean std skewness kurtosis 
0.04016 1.693 0.06141 12.92 
Number of Observations: 9845 
% simple tests 
> sl=skewness (sibm) 
> t=sl1/sqrt (6/9845) 
> 6 
[1] 2.487851 
> pv=2* (1-pnorm(t) ) 
> pv 
[1] 0.01285177 


Compute skewness 
Perform test of skewness 


o 
6 
o 
6 


oe 


Calculate p-value. 


> libm=log(da[,2]+1)*100 % Turn to log-return 
> t.test(libm) % Test expected return being zero. 
One-sample t-Test 
data: libm 
t = 1.5126, df = 9844, p-value = 0.1304 
alternative hypothesis: mean is not equal to 0 
95 percent confidence interval: 
-0.007641473 0.059290531 


> normalTest (libm,method='jb’) % Normality test 
Test for Normality: Jarque-Bera 
Null Hypothesis: data is normally distributed 


Test Stat 60921.93 

p.value 0.00 
Dist. under Null: chi-square with 2 degrees of freedom 
Total Observ.: 9845 


Remark. In S-Plus, kurtosis is the regular kurtosis, not excess kurtosis. That 
is, S-Plus does not subtract 3 from the sample kurtosis. Also, in many cases R and 
S-Plus use the same commands. 


1.2.2 Distributions of Returns 


The most general model for the log returns {rj;;i = 1,..., N; t = 1,..., T} is its 
joint distribution function: 


Fr 11, esT NRT +005 TN25-.-3 717, ++ ENT V3 8), (1.14) 
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where Y is a state vector consisting of variables that summarize the environment 
in which asset returns are determined and @ is a vector of parameters that uniquely 
determines the distribution function F,(-). The probability distribution F,(-) gov- 
erns the stochastic behavior of the returns 7;; and Y. In many financial studies, the 
state vector Y is treated as given and the main concern is the conditional distri- 
bution of {r;;} given Y. Empirical analysis of asset returns is then to estimate the 
unknown parameter 6 and to draw statistical inference about the behavior of {r;;} 
given some past log returns. 

The model in Eq. (1.14) is too general to be of practical value. However, it 
provides a general framework with respect to which an econometric model for 
asset returns r;; can be put in a proper perspective. 

Some financial theories such as the capital asset pricing model (CAPM) of 
Sharpe (1964) focus on the joint distribution of N returns at a single time index 
t (i.e., the distribution of {71;,...,7n;}). Other theories emphasize the dynamic 
structure of individual asset returns (i.e., the distribution of {r;1,..., rir} for a 
given asset i). In this book, we focus on both. In the univariate analysis of Chapters 
2-7, our main concern is the joint distribution of {ri} for asset i. To this end, 
it is useful to partition the joint distribution as 


F(ry,...,7i7s 0) = F (ro)F Gialria) = F@irlrir-1, ---, 711) 


T 
= F (ra) | | F filr -++ ra), (1.15) 
t=2 


where, for simplicity, the parameter @ is omitted. This partition highlights the 
temporal dependencies of the log return r;;. The main issue then is the specification 
of the conditional distribution F(rj;|rj;—1,-), in particular, how the conditional 
distribution evolves over time. In finance, different distributional specifications 
lead to different theories. For instance, one version of the random-walk hypothesis 


is that the conditional distribution F(rj;|rjr—1,...,7i1) is equal to the marginal 
distribution F(r;,). In this case, returns are temporally independent and, hence, not 
predictable. 


It is customary to treat asset returns as continuous random variables, especially 
for index returns or stock returns calculated at a low frequency, and use their 
probability density functions. In this case, using the identity in Eq. (1.9), we can 
write the partition in Eq. (1.15) as 


T 
first 0) = frin OT] F Ciri ri: 8). (1.16) 


t=2 


For high-frequency asset returns, discreteness becomes an issue. For example, stock 


prices change in multiples of a tick size on the New York Stock Exchange (NYSE). 
The tick size was t of a dollar before July 1997 and was a of a dollar from July 


1997 to January 2001. Therefore, the tick-by-tick return of an individual stock listed 


16 FINANCIAL TIME SERIES AND THEIR CHARACTERISTICS 


on the NYSE is not continuous. We discuss high-frequency stock price changes 
and time durations between price changes later in Chapter 5. 


Remark. On August 28, 2000, the NYSE began a pilot program with 7 stocks 
priced in decimals and the American Stock Exchange (AMEX) began a pilot pro- 
gram with 6 stocks and two options classes. The NYSE added 57 stocks and 94 
stocks to the program on September 25 and December 4, 2000, respectively. All 
NYSE and AMEX stocks started trading in decimals on January 29, 2001. 


Equation (1.16) suggests that conditional distributions are more relevant than 
marginal distributions in studying asset returns. However, the marginal distributions 
may still be of some interest. In particular, it is easier to estimate marginal distribu- 
tions than conditional distributions using past returns. In addition, in some cases, 
asset returns have weak empirical serial correlations, and, hence, their marginal 
distributions are close to their conditional distributions. 

Several statistical distributions have been proposed in the literature for the 
marginal distributions of asset returns, including normal distribution, lognormal dis- 
tribution, stable distribution, and scale mixture of normal distributions. We briefly 
discuss these distributions. 


Normal Distribution 

A traditional assumption made in financial study is that the simple returns {R;;|t = 
1,..., T} are independently and identically distributed as normal with fixed mean 
and variance. This assumption makes statistical properties of asset returns tractable. 
But it encounters several difficulties. First, the lower bound of a simple return is 
—1. Yet the normal distribution may assume any value in the real line and, hence, 
has no lower bound. Second, if Rj, is normally distributed, then the multiperiod 
simple return R;,[k] is not normally distributed because it is a product of one-period 
returns. Third, the normality assumption is not supported by many empirical asset 
returns, which tend to have a positive excess kurtosis. 


Lognormal Distribution 

Another commonly used assumption is that the log returns r; of an asset are inde- 
pendent and identically distributed (iid) as normal with mean u and variance o°. 
The simple returns are then iid lognormal random variables with mean and variance 


given by 


2 
E(R;) = exp (u + T) — 1, Var(R;) = exp(2u + 07)[exp(o7) — 1]. (1.17) 


These two equations are useful in studying asset returns (e.g., in forecasting using 
models built for log returns). Alternatively, let mı and mz be the mean and variance 
of the simple return R;, which is lognormally distributed. Then the mean and 
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variance of the corresponding log return r; are 


m,+1 m2 
Ef) = in| -= ], vae) = n| +— |. 
JV1l+m2/( + m)? (+m) 


Because the sum of a finite number of iid normal random variables is normal, 
r;[k] is also normally distributed under the normal assumption for {r;}. In addition, 
there is no lower bound for 7;, and the lower bound for R, is satisfied using 
1+ R; = exp(7;). However, the lognormal assumption is not consistent with all 
the properties of historical stock returns. In particular, many stock returns exhibit 
a positive excess kurtosis. 


Stable Distribution 

The stable distributions are a natural generalization of normal in that they are sta- 
ble under addition, which meets the need of continuously compounded returns rz. 
Furthermore, stable distributions are capable of capturing excess kurtosis shown 
by historical stock returns. However, nonnormal stable distributions do not have 
a finite variance, which is in conflict with most finance theories. In addition, sta- 
tistical modeling using nonnormal stable distributions is difficult. An example of 
nonnormal stable distributions is the Cauchy distribution, which is symmetric with 
respect to its median but has infinite variance. 


Scale Mixture of Normal Distributions 

Recent studies of stock returns tend to use scale mixture or finite mixture of normal 
distributions. Under the assumption of scale mixture of normal distributions, the log 
return r; is normally distributed with mean u and variance o? [i.e., r; ~ N(u, 07)). 
However, ø? is a random variable that follows a positive distribution (e.g., o~? 
follows a gamma distribution). An example of finite mixture of normal distribu- 
tions is 

r: ~ (1 — X)N (u, of) + XN (u, 03), 


where X is a Bernoulli random variable such that P(X = 1) = a and P(X = 0) = 
1—a with 0O<a <1, o? is small, and ey is relatively large. For instance, with 
a = 0.05, the finite mixture says that 95% of the returns follow N (u, o?) and 5% 
follow N (u, ae). The large value of oF enables the mixture to put more mass at the 
tails of its distribution. The low percentage of returns that are from N (y, o2) says 
that the majority of the returns follow a simple normal distribution. Advantages 
of mixtures of normal include that they maintain the tractability of normal, have 
finite higher order moments, and can capture the excess kurtosis. Yet it is hard to 
estimate the mixture parameters (e.g., the œ in the finite-mixture case). 

Figure 1.1 shows the probability density functions of a finite mixture of normal, 
Cauchy, and standard normal random variable. The finite mixture of normal is 
(1 — X)N(0,1)+ X x N(0,16) with X being Bernoulli such that P(X = 1) = 
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Figure 1.1 Comparison of finite mixture, stable, and standard normal density functions. 


0.05, and the density function of Cauchy is 


1 
FO) = Taya’ —0 <x < O0. 


It is seen that the Cauchy distribution has fatter tails than the finite mixture of 
normal, which, in turn, has fatter tails than the standard normal. 


1.2.3 Multivariate Returns 


Let r; = (rir, -.., ryt) be the log returns of N assets at time t. The multivariate 
analyses of Chapters 8 and 10 are concerned with the joint distribution of {r} jt 
This joint distribution can be partitioned in the same way as that of Eq. (1.15). 
The analysis is then focused on the specification of the conditional distribution 
function F(r;|r;—1,...,171, 9). In particular, how the conditional expectation and 
conditional covariance matrix of r, evolve over time constitute the main subjects 
of Chapters 8 and 10. 

The mean vector and covariance matrix of a random vector X = (X,..., Xp) 
are defined as 


E(X) = m; = [E(X1),..., E(X pY’, 
Cov(X) = Zy = E[(X — MXX — m)l, 
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provided that the expectations involved exist. When the data {x1,..., xr} of X 
are available, the sample mean and covariance matrix are defined as 


T T 
2 2 1 A p 
My = T dot 2, E T-1 dt = Hy) (X: _ hy)’. 


These sample statistics are consistent estimates of their theoretical counterparts pro- 
vided that the covariance matrix of X exists. In the finance literature, multivariate 
normal distribution is often used for the log return r;. 


1.2.4 Likelihood Function of Returns 


The partition of Eq. (1.15) can be used to obtain the likelihood function of the 
log returns {r,,..., rr} of an asset, where for ease in notation the subscript i is 
omitted from the log return. If the conditional distribution f(r;|r;-1,..., 71,9) is 
normal with mean jz; and variance a, then 0 consists of the parameters in u, and 
o?, and the likelihood function of the data is 


freer) = fri] | exp) (1.18) 
2 V270; 20/7 


where f (r1; 0) is the marginal density function of the first observation r1. The value 
of 0 that maximizes this likelihood function is the maximum-likelihood estimate 
(MLE) of 0. Since the log function is monotone, the MLE can be obtained by 
maximizing the log-likelihood function, 


T pA? 
In fi... rr; 0) =In f(r; 0) — >) [mer + In(o2) + oon 


t=2 t 


which is easier to handle in practice. The log-likelihood function of the data can 
be obtained in a similar manner if the conditional distribution f(r;|r;-1,...,71; 4) 
is not normal. 


1.2.5 Empirical Properties of Returns 


The data used in this section are obtained from the Center for Research in Secu- 
rity Prices (CRSP) of the University of Chicago. Dividend payments, if any, are 
included in the returns. Figure 1.2 shows the time plots of monthly simple returns 
and log returns of IBM stock from January 1926 to December 2008. A time plot 
shows the data against the time index. The upper plot is for the simple returns. 
Figure 1.3 shows the same plots for the monthly returns of value-weighted market 
index. As expected, the plots show that the basic patterns of simple and log returns 
are similar. 
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Figure 1.2 Time plots of monthly returns of IBM stock from January 1926 to December 2008. Upper 
panel is for simple returns, and lower panel is for log returns. 
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Figure 1.3 Time plots of monthly returns of value-weighted index from January 1926 to December 
2008. Upper panel is for simple returns, and lower panel is for log returns. 
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Table 1.2 provides some descriptive statistics of simple and log returns for 
selected U.S. market indexes and individual stocks. The returns are for daily and 
monthly sample intervals and are in percentages. The data spans and sample sizes 
are also given in Table 1.2. From the table, we make the following observations. 
(a) Daily returns of the market indexes and individual stocks tend to have high 
excess kurtoses. For monthly series, the returns of market indexes have higher 
excess kurtoses than individual stocks. (b) The mean of a daily return series is close 
to zero, whereas that of a monthly return series is slightly larger. (c) Monthly returns 
have higher standard deviations than daily returns. (d) Among the daily returns, 
market indexes have smaller standard deviations than individual stocks. This is in 
agreement with common sense. (e) The skewness is not a serious problem for both 
daily and monthly returns. (f) The descriptive statistics show that the difference 
between simple and log returns is not substantial. 

Figure 1.4 shows the empirical density functions of monthly simple and log 
returns of IBM stock from 1926 to 2008. Also shown, by a dashed line, in each 
graph is the normal probability density function evaluated by using the sample 
mean and standard deviation of IBM returns given in Table 1.2. The plots indicate 
that the normality assumption is questionable for monthly IBM stock returns. The 
empirical density function has a higher peak around its mean, but fatter tails than 
that of the corresponding normal distribution. In other words, the empirical density 
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Figure 1.4 Comparison of empirical and normal densities for monthly simple and log returns of IBM 
stock. Sample period is from January 1926 to December 2008. Left plot is for simple returns and right 
plot for log returns. Normal density, shown by the dashed line, uses sample mean and standard deviation 
given in Table 1.2. 
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function is taller and skinnier, but with a wider support than the corresponding 
normal density. 


1.3 PROCESSES CONSIDERED 


Besides the return series, we also consider the volatility process and the behavior of 
extreme returns of an asset. The volatility process is concerned with the evolution 
of conditional variance of the return over time. This is a topic of interest because, as 
shown in Figures 1.2 and 1.3, the variabilities of returns vary over time and appear 
in clusters. In application, volatility plays an important role in pricing options and 
risk management. By extremes of a return series, we mean the large positive or 
negative returns. Table 1.2 shows that the minimum and maximum of a return series 
can be substantial. The negative extreme returns are important in risk management, 
whereas positive extreme returns are critical to holding a short position. We study 
properties and applications of extreme returns, such as the frequency of occurrence, 
the size of an extreme, and the impacts of economic variables on the extremes, in 
Chapter 7. 

Other financial time series considered in the book include interest rates, exchange 
rates, bond yields, and quarterly earning per share of a company. Figure 1.5 shows 
the time plots of two U.S. monthly interest rates. They are the 10-year and 1-year 
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Figure 1.5 Time plots of monthly U.S. interest rates from April 1953 to February 2009: (a) 10-year 
Treasury constant maturity rate and (b) 1-year maturity rate. 
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Figure 1.6 Time plot of daily exchange rate between U.S. dollar and Japanese yen from January 4, 
2000, to March 27, 2009: (a) exchange rate and (b) changes in exchange rate. 


Treasury constant maturity rates from April 1954 to February 2009. As expected, 
the two interest rates moved in unison, but the 1-year rates appear to be more 
volatile. Figure 1.6 shows the daily exchange rate between the U.S. dollar and the 
Japanese yen from January 4, 2000, to March 27, 2009. From the plot, the exchange 
rate encountered occasional big changes in the sampling period. Table 1.3 provides 
some descriptive statistics for selected U.S. financial time series. The monthly bond 
returns obtained from CRSP are Fama bond portfolio returns from January 1952 to 
December 2008. The interest rates are obtained from the Federal Reserve Bank of 
St. Louis. The weekly 3-month Treasury bill rate started on January 8, 1954, and 
the 6-month rate started on December 12, 1958. Both series ended on March 27, 
2009. For the interest rate series, the sample means are proportional to the time to 
maturity, but the sample standard deviations are inversely proportional to the time 
to maturity. For the bond returns, the sample standard deviations are positively 
related to the time to maturity, whereas the sample means remain stable for all 
maturities. Most of the series considered have positive excess kurtoses. 

With respect to the empirical characteristics of returns shown in Table 1.2, 
Chapters 2—4 focus on the first four moments of a return series and Chapter 7 on 
the behavior of minimum and maximum returns. Chapters 8 and 10 are concerned 
with moments of and the relationships between multiple asset returns, and Chapter 5 
addresses properties of asset returns when the time interval is small. An introduction 
to mathematical finance is given in Chapter 6. 
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TABLE 1.3 Descriptive Statistics of Selected U.S. Financial Time Series“ 


Standard Excess 
Maturity Mean Deviation Skewness Kurtosis Minimum Maximum 


Monthly Bond Returns: Jan. 1952 to Dec. 2008, T = 684 


1-12 months 0.45 0.35 2.47 13.14 —0.40 3.52 
12-24 months 0.49 0.67 1.88 15.44 —2.94 6.85 
24-36 months 0.52 0.98 1.37 12.92 —4.90 9.33 
48-60 months 0:53. 1.40 0.60 4.83 =5:78 10.06 
61-120 months 0.55 1.69 0.65 4.79 =71.35 10.92 
Monthly Treasury Rates: April 1953 to February 2009, T = 671 
1 year 5.59 2.98 1.02 1.32 0.44 16.72 
3 years 5.98 2.85 0.95 0.95 1.07 16.22 
5 years 6.19 21T 0.97 0.82 1.52 13.93 
10 years 6.40 2.69 0.95 0.61 2.29 15.32 
Weekly Treasury Bill Rates: End on March 27, 2009. 
3 months 5.07 2.82 1.08 1.80 0.02 16.76 
6 months 5.52 2.73 0.99 1.53 0.20 15.76 


“The data are in percentages. The weekly 3-month Treasury bill rate started from January 8, 1954, and 
the 6-month rate started from December 12, 1958. The sample sizes for Treasury bill rates are 2882 
and 2625, respectively. Data sources are given in the text. 


APPENDIX: R PACKAGES 


R is a free software available from http://www.r-project.org. One can click CRAN 
on its Web page to select a nearby CRAN Mirror to download and install the 
software and selected packages. For financial time series analysis, the Rnetrics of 
Diethelm Wuertz and his associates have produced many useful packages, including 
fBasics, timeSeries, fGarch, etc. We use many functions of these packages in 
this book. Further information concerning installing R and the commands used can 
be found either on the Web page of this book or on the author’s teaching Web page. 

R and S-Plus are objective-oriented software. They enable users to create many 
objects. For instance, one can use the command ts to create a time series object. 
Treating time series data as a time series object in R has some advantages, but 
it requires some learning to get used to it. It is, however, not necessary to create 
a time series object in R to perform the analyses discussed in this book. As an 
illustration, consider the monthly simple returns to the General Motors stock from 
January 1975 to December 2008; see Exercise 1.2. The data have 408 observations. 
The following R commands are used to illustrate the points: 


da=read.table("m-gm3dx7508.txt",header=T) % Load data 
gm=da[,2] % Column 2 contains GM stock returns 
gml=ts (gm, frequency=12,start=c(1975,1)) 

Creates a ts object. 


Q 


par (mfcol=c(2,1)) % Put two plots on a page. 
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Figure 1.7 Time plots of monthly simple returns to General Motors stock from January 1975 to 
December 2008: (a) and (b) are without and with time series object, respectively. 


plot (gm, type='1') 
plot (gm1, type='1') 
acf (gm, lag=24) 

acf (gm1, lag=24) 


MN NWN 


In the ts command, frequency = 12 says that the time unit is year and there 
are 12 equally spaced observations in each time unit, and start = c(1975,1) means 
the starting time is January 1975. Frequency and start are the two basic arguments 
needed in R to create a time series object. For further details, please use help (ts) 
in R to obtain details of the command. Here gm1 is a time series object in R, but 
gm is not. Figures 1.7 and 1.8 show, respectively, the time plot and autocorrelation 
function (ACF) of the returns of GM stock. In each figure, the upper plot is pro- 
duced without using time series object, whereas the lower plot is produced by a 
time series object. The upper and lower plots are identical except for the horizontal 
label. For the time plot, the time series object uses calendar time to label the x 
axis, which is preferred. On the other hand, for the ACF plot, the time series object 
uses fractions of time unit in the label, not the commonly used time lags. 


EXERCISES 


1.1. Consider the daily stock returns of American Express (AXP), Caterpillar 
(CAT), and Starbucks (SBUX) from January 1999 to December 2008. The 
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Figure 1.8 Sample ACFs of the monthly simple returns to General Motors stock from January 1975 
to December 2008: (a) and (b) are without and with time series object, respectively. 


data are simple returns given in the file d-3stocks9908.txt (date, axp, cat, 
sbux). 


(a) Express the simple returns in percentages. Compute the sample mean, 
standard deviation, skewness, excess kurtosis, minimum, and maximum 
of the percentage simple returns. 

(b) Transform the simple returns to log returns. 

(c) Express the log returns in percentages. Compute the sample mean, stan- 
dard deviation, skewness, excess kurtosis, minimum, and maximum of the 
percentage log returns. 

(d) Test the null hypothesis that the mean of the log returns of each stock is 
zero. That is, perform three separate tests. Use 5% significance level to 
draw your conclusion. 

1.2. Answer the same questions as in Exercise 1.1 but using monthly stock returns 
for General Motors (GM), CRSP value-weighted index (VW), CRSP equal- 
weighted index (EW), and S&P composite index from January 1975 to Decem- 
ber 2008. The returns of the indexes include dividend distributions. Data file 
is m-gm3dx7508.txt (date, gm, vw, ew, sp). 

1.3. Consider the monthly stock returns of S&P composite index from January 
1975 to December 2008 in Exercise 1.2. Answer the following questions: 


(a) What is the average annual log return over the data span? 
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1.4. 


1.5. 


(b) Assume that there were no transaction costs. If one invested $1.00 on the 
S&P composite index at the beginning of 1975, what was the value of the 
investment at the end of 2008? 


Consider the daily log returns of American Express stock from January 1999 
to December 2008 as in Exercise 1.1. Use the 5% significance level to perform 
the following tests: (a) Test the null hypothesis that the skewness measure of 
the returns is zero. (b) Test the null hypothesis that the excess kurtosis of the 
returns is zero. 


Daily foreign exchange rates (spot rates) can be obtained from the Federal 
Reserve Bank in Chicago. The data are the noon buying rates in New York City 
certified by the Federal Reserve Bank of New York. Consider the exchange 
rates between the U.S. dollar and the Canadian dollar, euro, U.K. pound, and 
the Japanese yen from January 4, 2000, to March 27, 2009. The data are 
also on the Web. (a) Compute the daily log return of each exchange rate. 
(b) Compute the sample mean, standard deviation, skewness, excess kurtosis, 
minimum, and maximum of the log returns of each exchange rate. (c) Discuss 
the empirical characteristics of the log returns of exchange rates. (d) Obtain a 
density plot of the daily long returns of dollar—euro exchange rate. 
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CHAPTER 2 


Linear Time Series Analysis 
and Its Applications 


In this chapter, we discuss basic theories of linear time series analysis, introduce 
some simple econometric models useful for analyzing financial data, and apply the 
models to financial time series such as asset returns. Discussions of the concepts are 
brief with emphasis on those relevant to financial applications. Understanding the 
simple time series models introduced here will go a long way to better appreciate 
the more sophisticated financial econometric models of the later chapters. There 
are many time series textbooks available. For basic concepts of linear time series 
analysis, see Box, Jenkins, and Reinsel (1994, Chapters 2 and 3) and Brockwell 
and Davis (1996, Chapters 1-3). 

Treating an asset return (e.g., log return r; of a stock) as a collection of random 
variables over time, we have a time series {r;}. Linear time series analysis provides 
a natural framework to study the dynamic structure of such a series. The theories 
of linear time series discussed include stationarity, dynamic dependence, autocor- 
relation function, modeling, and forecasting. The econometric models introduced 
include (a) simple autoregressive (AR) models, (b) simple moving-average (MA) 
models, (b) mixed autoregressive moving-average (ARMA) models, (c) seasonal 
models, (d) unit-root nonstationarity, (e) regression models with time series errors, 
and (f) fractionally differenced models for long-range dependence. For an asset 
return r;, simple models attempt to capture the linear relationship between r, and 
information available prior to time t. The information may contain the historical 
values of r, and the random vector Y in Eq. (1.14), which describes the eco- 
nomic environment under which the asset price is determined. As such, correlation 
plays an important role in understanding these models. In particular, correlations 
between the variable of interest and its past values become the focus of linear 
time series analysis. These correlations are referred to as serial correlations or 
autocorrelations. They are the basic tool for studying a stationary time series. 
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2.1 STATIONARITY 


The foundation of time series analysis is stationarity. A time series {r;} is said to 
be strictly stationary if the joint distribution of (7;,,..., fy) is identical to that of 
(ru+r, ---, Ty+t) for all t, where k is an arbitrary positive integer and (t),..., tg) is 
a collection of k positive integers. In other words, strict stationarity requires that the 
joint distribution of (r;,,...,/,,) is invariant under time shift. This is a very strong 
condition that is hard to verify empirically. A weaker version of stationarity is often 
assumed. A time series {r;} is weakly stationary if both the mean of r, and the 
covariance between r, and r;—¢ are time invariant, where £ is an arbitrary integer. 
More specifically, {r;} is weakly stationary if (a) E(r;) = u, which is a constant, 
and (b) Cov(7, r,—¢) = ve, which only depends on £. In practice, suppose that we 
have observed T data points {r;|t = 1,..., T}. The weak stationarity implies that 
the time plot of the data would show that the T values fluctuate with constant 
variation around a fixed level. In applications, weak stationarity enables one to 
make inference concerning future observations (e.g., prediction). 

Implicitly, in the condition of weak stationarity, we assume that the first two 
moments of r, are finite. From the definitions, if r; is strictly stationary and its 
first two moments are finite, then r; is also weakly stationary. The converse is 
not true in general. However, if the time series r; is normally distributed, then 
weak stationarity is equivalent to strict stationarity. In this book, we are mainly 
concerned with weakly stationary series. 

The covariance yg = Cov(r;, 7;~¢) is called the lag-@ autocovariance of r,. It has 
two important properties: (a) yo = Var(r;) and (b) y_e = ye. The second property 
holds because Cov(r;, 7;—~(—e)) = Cov(7;—(~2), re) = Cov(;+¢, re) = Cov(ry,, ry), 
where t =t + £. 

In the finance literature, it is common to assume that an asset return series is 
weakly stationary. This assumption can be checked empirically provided that a 
sufficient number of historical returns are available. For example, one can divide 
the data into subsamples and check the consistency of the results obtained across 
the subsamples. 


2.2 CORRELATION AND AUTOCORRELATION FUNCTION 


The correlation coefficient between two random variables X and Y is defined as 


Cov(X, Y) E[(X — px) (¥ — py)] 


Po = Tava) VEX = pe EW = py) 


where u, and py are the mean of X and Y, respectively, and it is assumed that the 
variances exist. This coefficient measures the strength of linear dependence between 
X and Y, and it can be shown that —1 < px,y < 1 and px,y = py,,. The two random 
variables are uncorrelated if p;,y = 0. In addition, if both X and Y are normal 
random variables, then p,,, = O if and only if X and Y are independent. When the 
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sample {(x;, YD is available, the correlation can be consistently estimated by 
its sample counterpart 


ĝ ao Dae-D-) 
ene PE 


where x = peers and y= > y,/T are the sample mean of X and Y, 
respectively. 


Autocorrelation Function (ACF) 

Consider a weakly stationary return series r;. When the linear dependence between 
r, and its past values r;—; is of interest, the concept of correlation is generalized 
to autocorrelation. The correlation coefficient between r; and r;—,¢ is called the 
lag-€ autocorrelation of r, and is commonly denoted by pe, which under the weak 
stationarity assumption is a function of £ only. Specifically, we define 


Cov(r;, re) = Cov(r;, ree) E Ye (2.1) 


T JNa) Varre) Var(r;) Yo 


where the property Var(r;) = Var(7;—¢) for a weakly stationary series is used. From 
the definition, we have po = 1, pe = p-e, and —1 < pe < 1. In addition, a weakly 
stationary series r; is not serially correlated if and only if og = O for all €>0. 
For a given sample of returns i ee let r be the sample mean (i.e., F = 
5 1rt/T). Then the lag-1 sample autocorrelation of r, is 


jy = Zaal Dra 7) 
' Dial = 


Under some general conditions, 6) is a consistent estimate of pı. For example, if 
{r;} is an independent and identically distributed (iid) sequence and E(r?) 00; 
then ô; is asymptotically normal with mean zero and variance 1/T; see Brockwell 
and Davis (1991, Theorem 7.2.2). This result can be used in practice to test the 
null hypothesis Ho : pı = O versus the alternative hypothesis Ha : p1 # 0. The test 
statistic is the usual f ratio, which is v/T 6, and follows asymptotically the standard 
normal distribution. The null hypothesis Ho is rejected if the ¢ ratio is large in 
magnitude or, equivalently, the p value of the ¢ ratio is small, say less than 0.05. 
In general, the lag- sample autocorrelation of r, is defined as 


jy = Z -Dre -A 


= ; 0<@€<T-l. (2.2) 

pe ee 
If {r;} is an iid sequence satisfying E(r?) < 00, then ôg is asymptotically normal 
with mean zero and variance 1/T for any fixed positive integer £. More generally, 
if r, is a weakly stationary time series satisfying r; = u + Yla Wid;—i, where 
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Wo = 1 and {aj} is a sequence of iid random variables with mean zero, then fg is 
asymptotically normal with mean zero and variance (1 + 2 yY 1 p?) /T for £ >q. 
This is referred to as Bartlett’s formula in the time series literature; see Box, 
Jenkins, and Reinsel (1994). For more information about the asymptotic distribution 
of sample autocorrelations, see Fuller (1976, Chapter 6) and Brockwell and Davis 
(1991, Chapter 7). 


Testing Individual ACF 
For a given positive integer £, the previous result can be used to test Ho : pg = 0 
vs. Ha : pe Æ 0. The test statistic is 


me 
y (+2 iat 6?)/T 


If {r;} is a stationary Gaussian series satisfying p; =0 for j >, the f ratio 
is asymptotically distributed as a standard normal random variable. Hence, the 
decision rule of the test is to reject Ho if |tratio| > Zy/2, where Zy/2 is the 
100(1 — @/2)th percentile of the standard normal distribution. For simplicity, many 
software packages use 1/T as the asymptotic variance of ĝe for all £ 40. They 
essentially assume that the underlying time series is an iid sequence. 

In finite samples, Ôe is a biased estimator of pẹ. The bias is in the order of 
1/T, which can be substantial when the sample size T is small. In most financial 
applications, T is relatively large so that the bias is not serious. 


tratio = 


Portmanteau Test 
Financial applications often require to test jointly that several autocorrelations of 
r, are zero. Box and Pierce (1970) propose the Portmanteau statistic 


m 


Or(m) =T ô? 
é=1 


as a test statistic for the null hypothesis Ho : p1 = --- = Pm = 0 against the alter- 
native hypothesis H, : pi 4 0 for some i € {1,...,m}. Under the assumption that 
{r;} is an iid sequence with certain moment conditions, Q*(m) is asymptotically a 
chi-squared random variable with m degrees of freedom. 

Ljung and Box (1978) modify the Q*(m) statistic as below to increase the power 
of the test in finite samples, 


m a2 


Q(m) =T(T +2) >> -A 
g=] 


: (2:3) 


TS 


The decision rule is to reject Ho if Q (m) > x. where x2 denotes the 100(1 — @)th 
percentile of a chi-squared distribution with m degrees of freedom. Most software 
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packages will provide the p value of Q(m). The decision rule is then to reject Ho 
if the p value is less than or equal to a, the significance level. 

In practice, the choice of m may affect the performance of the Q(m) statistic. 
Several values of m are often used. Simulation studies suggest that the choice of 
m ~ ln(T) provides better power performance. This general rule needs modification 
in analysis of seasonal time series for which autocorrelations with lags at multiples 
of the seasonality are more important. 

The statistics 6), 62,... defined in Eq. (2.2) is called the sample autocorrela- 
tion function (ACF) of r,. It plays an important role in linear time series analysis. 
As a matter of fact, a linear time series model can be characterized by its ACF, 
and linear time series modeling makes use of the sample ACF to capture the lin- 
ear dynamic of the data. Figure 2.1 shows the sample autocorrelation functions 
of monthly simple and log returns of IBM stock from January 1926 to Decem- 
ber 2008. The two sample ACFs are very close to each other, and they suggest 
that the serial correlations of monthly IBM stock returns are very small, if any. 
The sample ACFs are all within their two standard error limits, indicating that 
they are not significantly different from zero at the 5% level. In addition, for the 
simple returns, the Ljung—Box statistics give Q(5) = 3.37 and Q(10) = 13.99, 
which correspond to p values of 0.64 and 0.17, respectively, based on chi-squared 
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Figure 2.1 Sample autocorrelation functions of monthly (a) simple returns and (b) log returns of 
IBM stock from January 1926 to December 2008. In each plot, two horizontal dashed lines denote two 
standard error limits of sample ACF. 


34 LINEAR TIME SERIES ANALYSIS AND ITS APPLICATIONS 


N 

[e] 

"= EEE (em | ae EE E E E E TT E eee A E 
52 | Cit j! bo tt ER R ON | ala L a m 
zo fe as TR M T TT [H T OET 

F 

N 

F 

0 20 40 60 80 100 
Lag 
(a) 

N 

(=) 

5 | a [oo fone 
52 dalla , 

e SE T Ran iiaa | 

F 

N 

T 

0 20 40 60 80 100 
Lag 
(b) 


Figure 2.2 Sample autocorrelation functions of monthly (a) simple returns and (b) log returns of value- 
weighted index of U.S. markets from January 1926 to December 2008. In each plot, two horizontal 
dashed lines denote two standard error limits of sample ACF. 


distributions with 5 and 10 degrees of freedom. For the log returns, we have 
Q(5) = 3.52 and Q(10) = 13.39 with p values 0.62 and 0.20, respectively. The 
joint tests confirm that monthly IBM stock returns have no significant serial corre- 
lations. Figure 2.2 shows the same for the monthly returns of the value-weighted 
index from the Center for Research in Security Prices (CRSP), at the University 
of Chicago. There are some significant serial correlations at the 5% level for both 
return series. The Ljung—Box statistics give Q(5) = 29.71 and Q(10) = 39.55 for 
the simple returns and Q(5) = 28.38 and Q(10) = 36.16 for the log returns. The 
p values of these four test statistics are all less than 0.0001, suggesting that monthly 
returns of the value-weighted index are serially correlated. Thus, the monthly mar- 
ket index return seems to have stronger serial dependence than individual stock 
returns. 

In the finance literature, a version of the capital asset pricing model (CAPM) 
theory is that the return {r;} of an asset is not predictable and should have no auto- 
correlations. Testing for zero autocorrelations has been used as a tool to check the 
efficient market assumption. However, the way by which stock prices are deter- 
mined and index returns are calculated might introduce autocorrelations in the 
observed return series. This is particularly so in analysis of high-frequency financial 
data. We discuss some of these issues, such as bid—ask bounce and nonsynchronous 
trading, in Chapter 5. 
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R Demonstration 
The following output has been edited and % denotes explanation: 


> da=read.table("m-ibm3dx2608.txt",header=T) % Load data 
> da[1,] % Check the 1st row of the data 
date rtn vwrtn ewrtn sprtn 
1 19260130 -0.010381 0.000724 0.023174 0.022472 
> sibm=da[,2] % Get the IBM simple returns 


> Box.test(sibm, lag=5,type=’Ljung’) % Ljung-Box statistic Q(5) 


Box-Ljung test 


data: sibm 
X-squared = 3.3682, df = 5, p-value = 0.6434 


> libm=log(sibm+1) % Log IBM returns 
> Box.test(libm, lag=5,type=’Ljung’ ) 


Box-Ljung test 


data: libm 
X-squared = 3.5236, df = 5, p-value = 0.6198 


S-Plus Demonstration 
Output edited. 


> module(finmetrics) 
> da=read.table("m-ibm3dx2608.txt",header=T) % Load data 
> da[1,] % Check the 1st row of the data 
date rtn vwrtn ewrtn sprtn 
1 19260130 -0.010381 0.000724 0.023174 0.022472 
> sibm=da[,2] % Get IBM simple returns 
> autocorTest(sibm,lag=5) % Ljung-Box Q(5) test 


Test for Autocorrelation: Ljung-Box 
Null Hypothesis: no autocorrelation 
Test Statistics: 
Test Stat 3.3682 
p.value 0.6434 


Dist. under Null: chi-square with 5 degrees of freedom 
Total Observ.: 996 

> libm=log(sibm+1) % IBM log returns 

> autocorTest(libm, lag=5) 


Test for Autocorrelation: Ljung-Box 
Null Hypothesis: no autocorrelation 
Test Statistics: 
Test Stat 3.5236 
p.value 0.6198 
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2.3 WHITE NOISE AND LINEAR TIME SERIES 


White Noise 

A time series r; is called a white noise if {r;} is a sequence of independent and 
identically distributed random variables with finite mean and variance. In particular, 
if r; is normally distributed with mean zero and variance a”, the series is called a 
Gaussian white noise. For a white noise series, all the ACFs are zero. In practice, 
if all sample ACFs are close to zero, then the series is a white noise series. Based 
on Figures 2.1 and 2.2, the monthly returns of IBM stock are close to white noise, 
whereas those of the value-weighted index are not. 

The behavior of sample autocorrelations of the value-weighted index returns 
indicates that for some asset returns it is necessary to model the serial dependence 
before further analysis can be made. In what follows, we discuss some simple time 
series models that are useful in modeling the dynamic structure of a time series. 
The concepts presented are also useful later in modeling volatility of asset returns. 


Linear Time Series 
A time series r, is said to be linear if it can be written as 


CO 
r= E+ > Viani, (2.4) 
i=0 


where jz is the mean of r;, Yo = 1, and {a;} is a sequence of iid random variables 
with mean zero and a well-defined distribution (i.e., {a+} is a white noise series). It 
will be seen later that a; denotes the new information at time ¢ of the time series 
and is often referred to as the innovation or shock at time t. In this book, we are 
mainly concerned with the case where a; is a continuous random variable. Not 
all financial time series are linear, however. We study nonlinearity and nonlinear 
models in Chapter 4. 

For a linear time series in Eq. (2.4), the dynamic structure of r, is governed by 
the coefficients y;, which are called the y weights of r, in the time series literature. 
If r; is weakly stationary, we can obtain its mean and variance easily by using the 
independence of {a;} as 


E)=p, Var) =} Y, (2.5) 
i=0 


where o2 is the variance of a;. Because Var(r;) < 00, v3 must be a convergent 
sequence, that is, y? — 0 asi — œ. Consequently, for a stationary series, impact 
of the remote shock a;_; on the return r; vanishes as i increases. 

The lag-@ autocovariance of r; is 


[0.0] [0.0] 
ve = Cows, rie) = E | (X viani | | DS vere 
j=0 j=0 
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CO 
=E >». Wi W jaridi—e-j 


i,j=0 


=o yjupjEa ej) = 07 Y vise. (2.6) 


j=0 j=0 
Consequently, the y weights are related to the autocorrelations of r, as follows: 


ve = žy Wi Wi+e e>0, (2.7) 


Pe = = g pn 
w 1+ v? 


where Wo = |. Linear time series models are econometric and statistical models 
used to describe the pattern of the w weights of r;. For a weakly stationary time 
series, Y; — 0 asi — oo and, hence, pe converges to zero as £ increases. For asset 
returns, this means that, as expected, the linear dependence of current return 7; on 
the remote past return r;~¢ diminishes for large £. 


2.4 SIMPLE AR MODELS 


The fact that the monthly return 7; of CRSP value-weighted index has a statistically 
significant lag-1 autocorrelation indicates that the lagged return r;_; might be useful 
in predicting r;. A simple model that makes use of such predictive power is 


ri = Qo + Qiri-1 + ar, (2.8) 


where {a;} is assumed to be a white noise series with mean zero and variance 
o. This model is in the same form as the well-known simple linear regression 
model in which r, is the dependent variable and r;_; is the explanatory variable. 
In the time series literature, model (2.8) is referred to as an autoregressive (AR) 
model of order 1 or simply an AR(1) model. This simple model is also widely 
used in stochastic volatility modeling when r; is replaced by its log volatility; see 
Chapters 3 and 12. 

The AR(1) model in Eq. (2.8) has several properties similar to those of the 
simple linear regression model. However, there are some significant differences 
between the two models, which we discuss later. Here it suffices to note that an 
AR(1) model implies that, conditional on the past return r;_;, we have 


E(rilni—1) = o + Qiri,  Var(rilri—1) = Var(ar) = 02. 


That is, given the past return r;—1, the current return is centered around ho + @11;—1 
with standard deviation o,. This is a Markov property such that conditional on r;—1, 
the return 7; is not correlated with r;_; for i> 1. Obviously, there are situations 
in which 7;_; alone cannot determine the conditional expectation of r; and a more 
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flexible model must be sought. A straightforward generalization of the AR(1) model 
is the AR(p) model: 


ri = Got Qirr-1 +--+ + bpti-p + ar, (2.9) 


where p is a nonnegative integer and {a;} is defined in Eq. (2.8). This model 
says that the past p variables r;_; (i = 1,..., p) jointly determine the conditional 
expectation of r, given the past data. The AR(p) model is in the same form as 
a multiple linear regression model with lagged values serving as the explanatory 
variables. 


2.4.1 Properties of AR Models 


For effective use of AR models, it pays to study their basic properties. We discuss 
properties of AR(1) and AR(2) models in detail and give the results for the general 
AR(p) model. 


AR(1) Model 

We begin with the sufficient and necessary condition for weak stationarity of the 
AR(1) model in Eq. (2.8). Assuming that the series is weakly stationary, we have 
E(r;) = u, Var(r;) = yo, and Cov(r;, r;—;) = yj, where u and yo are constant and 
yj is a function of j, not t. We can easily obtain the mean, variance, and autocor- 
relations of the series as follows. Taking the expectation of Eq. (2.8) and because 
E (a) = 0, we obtain 


E(ri) = ġo + Q1 E (1-1). 


Under the stationarity condition, E (r) = E(r~1) = u and hence 


po 
1—¢ 


This result has two implications for r;. First, the mean of r; exists if ġı Æ 1. 
Second, the mean of r, is zero if and only if ¢9 = 0. Thus, for a stationary AR(1) 
process, the constant term ¢ is related to the mean of 7; via dg = (1 — $1) and 
$o = 0 implies that E(r;) = 0. 

Next, using ġo = (1 — ġı)u, the AR(1) model can be rewritten as 


u = o+ ip or Efr)=u= 


ri — U = Qı (r1 — U) + ar. (2.10) 
By repeated substitutions, the prior equation implies that 


Ti — U = a + piai + plat 


= Y` pjani. (2.11) 
i=0 
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This equation expresses an AR(1) model in the form of Eq. (2.4) with y; = i. 
Thus, r; — u is a linear function of a;_; for i > 0. Using this property and the 
independence of the series {a;}, we obtain E[(r; — (1)a;41] = 0. By the stationarity 
assumption, we have Cov(r;—1, at) = E[(m—1 — “)a;] = 0. This latter result can 
also be seen from the fact that r;_; occurred before time t and a; does not depend 
on any past information. Taking the square, then the expectation of Eq. (2.10), we 
obtain 


Var(r;) = $7 Var(r:-1) + 07, 


where o2 is the variance of a, and we make use of the fact that the covariance 
between r;_; and a; is zero. Under the stationarity assumption, Var(r;) = Var(7;—1), 
so that 


oa 
Var(r;) = Ig 
1 


provided that o? < 1. The requirement of br <1 results from the fact that the 
variance of a random variable is bounded and nonnegative. Consequently, the weak 
stationarity of an AR(1) model implies that —1 < ġı < 1, that is, |6;| < 1. Yet if 
|@1| < 1, then by Eq. (2.11) and the independence of the {a,;} series, we can show 
that the mean and variance of r, are finite and time invariant; see Eq. (2.5). In 
addition, by Eq. (2.6), all the autocovariances of r, are finite. Therefore, the AR(1) 
model is weakly stationary. In summary, the necessary and sufficient condition for 
the AR(1) model in Eq. (2.8) to be weakly stationary is |¢)| < 1. 
Using ġo = (1 — $1), one can rewrite a stationary AR(1) model as 


r; = (1 — e+ birj-1 + ar. 


This model is often used in the finance literature with ¢; measuring the persistence 
of the dynamic dependence of an AR(1) time series. 


Autocorrelation Function of an AR(1) Model 
Multiplying Eq. (2.10) by a, using the independence between a, and r;_1, and 
taking expectation, we obtain 


Ela (r, — u)] = 1 Elar(ri-1 — #)] + E(@?) = E(a?) =0?, 


where ø is the variance of ap. Multiplying Eq. (2.10) by r;_¢ — 2, taking expec- 
tation, and using the prior result, we have 


_ divi + ož if €=0 


oa Pive-1 if €>0, 
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where we use yg = y_¢. Consequently, for a weakly stationary AR(1) model in 
Eq. (2.8), we have 


5 


Var(r;) = yo = and ye=oiyve-1, for €>0. 


1-— $F 
From the latter equation, the ACF of r; satisfies 


pe = Qi pe-1, for €>0. 


Because pọ = 1, we have pọ = gt. This result says that the ACF of a weakly 
stationary AR(1) series decays exponentially with rate ġı and starting value pọ = 1. 
For a positive ¢;, the plot of ACF of an AR(1) model shows a nice exponential 
decay. For a negative ¢), the plot consists of two alternating exponential decays 
with rate pe: Figure 2.3 shows the ACF of two AR(1) models with ¢; = 0.8 and 
go; = —0.8. 


AR(2) Model 
An AR(2) model assumes the form 


ri = bo + Qiri- + zri- + ar. (2.12) 
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Figure 2.3 Autocorrelation function of an AR(1) model: (a) for ¢; = 0.8 and (b) for ¢; = —0.8. 
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Using the same technique as that of the AR(1) case, we obtain 


po 
1—ġi -Qh 


provided that ¢; + ¢2 Æ 1. Using ġo = (1 — ġı — ¢2)u, we can rewrite the AR(2) 
model as 


E(r;)= b= 


(rt — U) = 11-1 — U) + b2(%1-2 — U) + ar. 
Multiplying the prior equation by (7;~¢ — u), we have 
(re — Wt = u) = Qi (rie — u) C-i — y) 
+ alre — U) (1-2 — U) + Fie — Way. 
Taking expectation and using E[(7;~¢ — )a;] = 0 for £ > 0, we obtain 
ve=Give-1+b2¥e-2, for &€>0. 


This result is referred to as the moment equation of a stationary AR(2) model. 
Dividing the above equation by yo, we have the property 


Pe = Pipe-1+2Pe-2, for €>0, (2.13) 


for the ACF of r;. In particular, the lag-1 ACF satisfies 


pı = $1 po + d20-1 = $1 + Q2p1. 
Therefore, for a stationary AR(2) series r;, we have po = 1, 


_ 1 
1—¢ 
Pe = G1 Pc-1 + brpe-2,. = 2. 


p1 


The result of Eq. (2.13) says that the ACF of a stationary AR(2) series satisfies the 
second-order difference equation 


(1 — $B — ġ2B°)pe = 0, 


where B is called the back-shift operator such that Bog = pe—1. This difference 
equation determines the properties of the ACF of a stationary AR(2) time series. 
It also determines the behavior of the forecasts of r;. In the time series literature, 
some people use the notation L instead of B for the back-shift operator. Here 
L stands for lag operator. For instance, Lr; = r;-; and Lyk = Wr-1. 


42 LINEAR TIME SERIES ANALYSIS AND ITS APPLICATIONS 


Corresponding to the prior difference equation, there is a second-order polyno- 
mial equation: 


1 — ox — ox”? = 0. (2.14) 


Solutions of this equation are 


B dito, +42 


—2¢2 


In the time series literature, inverses of the two solutions are referred to as the 
characteristic roots of the AR(2) model. Denote the two characteristic roots by 
w; and œ. If both œ; are real valued, then the second-order difference equation 
of the model can be factored as (1 — w,B)(1 — œB) and the AR(2) model can 
be regarded as an AR(1) model operates on top of another AR(1) model. The 
ACF of r, is then a mixture of two exponential decays. If p + 4¢2 < 0, then a 
and œ are complex numbers (called a complex-conjugate pair), and the plot of 
ACF of r; would show a picture of damping sine and cosine waves. In business and 
economic applications, complex characteristic roots are important. They give rise to 
the behavior of business cycles. It is then common for economic time series models 
to have complex-valued characteristic roots. For an AR(2) model in Eq. (2.12) with 
a pair of complex characteristic roots, the average length of the stochastic cycles is 


_ 27 
— cos™ [i /2/702)] 


where the cosine inverse is stated in radians. If one writes the complex solutions 
as a + bi, where i = ./—1, then we have ¢; = 2a, ¢2 = — (a? + b?), and 


_ 27 


~ cos™! (a / va? + B2)’ 


where Va? + b? is the absolute value of a + bi. See Example 2.1 for an illustration. 

Figure 2.4 shows the ACF of four stationary AR(2) models. Part (b) is the ACF 
of the AR(2) model (1 — 0.6B + 0.4B7)r; = ar. Because $7 + 462 = 0.36 + 4 x 
(—0.4) = —1.24 < 0, this particular AR(2) model contains two complex charac- 
teristic roots, and hence its ACF exhibits damping sine and cosine waves. The 
other three AR(2) models have real-valued characteristic roots. Their ACFs decay 
exponentially. 


Example 2.1. As an illustration, consider the quarterly growth rate of U.S. 
real gross national product (GNP), seasonally adjusted, from the second quarter 
of 1947 to the first quarter of 1991. This series shown in Figure 2.5 is also used 
in Chapter 4 as an example of nonlinear economic time series. Here we simply 
employ an AR(3) model for the data. Denoting the growth rate by r;, we can use 
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Figure 2.4 Autocorrelation function of an AR(2) model: (a) ġı = 1.2 and ¢2 = —0.35, (b) ¢; = 0.6 
and ġ2 = —0.4, (c) ġı = 0.2 and $2 = 0.35, and (d) ¢; = —0.2 and ¢2 = 0.35. 


the model building procedure of the next subsection to estimate the model. The 
fitted model is 

ri = 0.0047 + 0.348r;_1 + 0.179r;—2 — 0.142r;—3 + ar, ôa = 0.0097. (2.15) 
Rewriting the model as 

ri — 0.348r;—1 — 0.179r;,—2 + 0.142r,—3 = 0.0047 + a;, 
we obtain a corresponding third-order difference equation 
1 — 0.348B — 0.179B? + 0.141B? = 0, 
which can be factored approximately as 
(1 + 0.521B)(1 — 0.869B + 0.274B7) = 0. 

The first factor (1 + 0.521B) shows an exponentially decaying feature of the GNP 
growth rate. Focusing on the second-order factor 1 — 0.869B — (—0.274)B? = 0, 


we have p + 4¢2 = 0.869? + 4(—0.274) = —0.341 < 0. Therefore, the second 
factor of the AR(3) model confirms the existence of stochastic business cycles 
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Figure 2.5 Time plot of growth rate of U.S. quarterly real GNP from 1947.II to 1991.1. Data are 
seasonally adjusted and in percentages. 


in the quarterly growth rate of U.S. real GNP. This is reasonable as the U.S. 
economy went through expansion and contraction periods. The average length of 
the stochastic cycles is approximately 


2(3.14159 
k= ee _ = 10.62 quarters, 


cos—! [1 /(2./—¢2)] 


which is about 3 years. If one uses a nonlinear model to separate U.S. economy 
into “expansion” and “contraction” periods, the data show that the average duration 
of contraction periods is about three quarters and that of expansion periods is about 
3 years; see the analysis in Chapter 4. The average duration of 10.62 quarters is 
a compromise between the two separate durations. The periodic feature obtained 
here is common among growth rates of national economies. For example, similar 
features can be found for many OECD (Organization for Economic Cooperation 
and Development) countries. 


R Demonstration 
The R demonstration for Example 2.1, where % denotes explanation, follows: 


> gnp=scan(file='dgnp82.txt’) % Load data 
% To create a time-series object 

> gnpl=ts(gnp, frequency=4,start=c (1947,2) ) 

> plot (gnp1) 
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> points(gnpl,pch='*’) 


> ml=ar(gnp,method=’’mle’’) % Find the AR order 

> mlSorder % An AR(3) is selected based on AIC 
[1] 3 

> m2=arima(gnp,order=c(3,0,0)) % Estimation 

> m2 

Call: 


arima(x = gnp, order = c(3, 0, 0)) 


Coefficients: 
ari ar2 ar3 intercept 
0.3480 0.1793 -0.1423 0.0077 
s.e. 0.0745 0.0778 0.0745 0.0012 


sigma*2 estimated as 9.427e-05: log likelihood=565.84, 
aic=-1121.68 


In R, ‘‘intercept’’ denotes the mean of the series. 
Therefore, the constant term is obtained below: 

> (1-.348-.1793+.1423)*0.0077 

[1] 0.0047355 

> sqrt(m2S$sigma2) % Residual standard error 

[1] 0.009709322 


oP æ 


> pl=c(1,-m2Scoef[1:3]) % Characteristic equation 

> roots=polyroot(p1) % Find solutions 

> roots 

[1] 1.5902534+1.063882i1 -1.920152+0.0000001 1.590253-1.063882i1 
> Mod(roots) % Compute the absolute values of the solutions 
[1] 1.913308 1.920152 1.913308 

% To compute average length of business cycles: 
> k=2*pi/acos(1.590253/1.913308) 

> k 

[1] 10.65638 


Stationarity 

The stationarity condition of an AR(2) time series is that the absolute values of 
its two characteristic roots are less than 1, that is, its two characteristic roots 
are less than 1 in modulus. Equivalently, the two solutions of the characteristic 
equation are greater than 1 in modulus. Under such a condition, the recursive 
equation in (2.13) ensures that the ACF of the model converges to O as the 
lag £ increases. This convergence property is a necessary condition for a sta- 
tionary time series. In fact, the condition also applies to the AR(1) model where 
the polynomial equation is 1 — ¢;x = 0. The characteristic root is w = 1/x = qi, 
which must be less than 1 in modulus for r; to be stationary. As shown before, 
pe= or for a stationary AR(1) model. The condition implies that pe — 0 as 
£> œ. 


46 LINEAR TIME SERIES ANALYSIS AND ITS APPLICATIONS 


AR(p) Model 
The results of the AR(1) and AR(2) models can readily be generalized to the 
general AR(p) model in Eq. (2.9). The mean of a stationary series is 


po 


E t) = = 
Ma poen 


provided that the denominator is not zero. The associated characteristic equation 
of the model is 


1 — ix — 2x? — ++» — px? =0. 


If all the solutions of this equation are greater than 1 in modulus, then the series 
r, is stationary. Again, inverses of the solutions are the characteristic roots of the 
model. Thus, stationarity requires that all characteristic roots are less than 1 in 
modulus. For a stationary AR(p) series, the ACF satisfies the difference equation 


(1 — $B — ¢2B* — - - - — 6B”) pe = 0, for £>0. 


The plot of ACF of a stationary AR(p) model would then show a mixture of 
damping sine and cosine patterns and exponential decays depending on the nature 
of its characteristic roots. 


2.4.2 Identifying AR Models in Practice 


In application, the order p of an AR time series is unknown. It must be specified 
empirically. This is referred to as the order determination (or order specification) 
of AR models, and it has been extensively studied in the time series literature. Two 
general approaches are available for determining the value of p. The first approach 
is to use the partial autocorrelation function, and the second approach uses some 
information criteria. 


Partial Autocorrelation Function (PACF) 

The PACF of a stationary time series is a function of its ACF and is a useful 
tool for determining the order p of an AR model. A simple, yet effective way to 
introduce PACF is to consider the following AR models in consecutive orders: 


r, = 01 + Oi 1rt-1 + ĉir, 
ri = Qo,2 + Q1,2rt-1 + Q2,2Ft—2 + ex, 


ri = $0,3 + Q1,3rt—1 + @2,3rt—2 + 63,3113 + 31, 
ri = 60,4 + Q1,4rr—1 + @2,4rt—2 + Ø3,4Ft—3 + P4,4Ft—4 + ear, 
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where ¢o,;, Qi j, and {e;r} are, respectively, the constant term, the coefficient of 
r;-i, and the error term of an AR(j) model. These models are in the form of a 
multiple linear regression and can be estimated by the least-squares method. As a 
matter of fact, they are arranged in a sequential order that enables us to apply the 
idea of partial F test in multiple linear regression analysis. The estimate b11 of 
the first equation is called the lag-1 sample PACF of r;. The estimate 2,2 of the 
second equation is the lag-2 sample PACF of r;. The estimate $3.3 of the third 
equation is the lag-3 sample PACF of r,, and so on. 

From the definition, the lag-2 PACF h22 shows the added contribution of r,—2 
to r, over the AR(1) model r; = ġo + dir;—1 + e1r. The lag-3 PACF shows the 
added contribution of r;_3 to r; over an AR(2) model, and so on. Therefore, for 
an AR(p) model, the lag-p sample PACF should not be zero, but b i,j Should be 
close to zero for all j > p. We make use of this property to determine the order 
p. For a stationary Gaussian AR(p) model, it can be shown that the sample PACF 
has the following properties: 


° bp, p converges to p as the sample size T goes to infinity. 
° Qeu converges to zero for all £ > p. 
e The asymptotic variance of bee is 1/T for £> p. 


These results say that, for an AR(p) series, the sample PACF cuts off at 
lag p. 

As an example, consider the monthly simple returns of CRSP value-weighted 
index from January 1926 to December 2008. Table 2.1 gives the first 12 lags of 
sample PACF of the series. With T = 996, the asymptotic standard error of the 
sample PACF is approximately 0.032. Therefore, using the 5% significant level, 
we identify an AR(3) or AR(9) model for the data (i.e., p = 3 or 9). If the 1% 
significant level is used, we specify an AR(3) model. 

As another example, Figure 2.6 shows the PACF of the GNP growth rate series 
of Example 2.1. The two dotted lines of the plot denote the approximate two 
standard error limits +(2/./176). The plot suggests an AR(3) model for the data 
because the first three lags of sample PACF appear to be large. 


TABLE 2.1 Sample Partial Autocorrelation Function and Some Information 
Criteria for the Monthly Simple Returns of CRSP Value-Weighted Index from 
January 1926 to December 2008 


p 1 2 3 4 5 6 

PACF 0.115 —0.030 —0.102 0.033 0.062 —0.050 
AIC —5.838 —5.837 —5.846 —5.845 —5.847 —5.847 
BIC —5.833 —5.827 —5.831 —5.825 —5.822 —5.818 
p 7 8 9 10 II 12 

PACF 0.031 0.052 0.063 0.005 —0.005 0.011 
AIC —5.846 —5.847 —5.849 —5.847 —5.845 —5.843 
BIC —5.812 —5.807 —5.805 —5.798 —5.791 —5.784 
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Figure 2.6 Sample partial autocorrelation function of U.S. quarterly real GNP growth rate from 1947.II 
to 1991.1. Dotted lines give approximate pointwise 95% confidence interval. 


Information Criteria 

There are several information criteria available to determine the order p of an AR 
process. All of them are likelihood based. For example, the well-known Akaike 
information criterion (AIC) (Akaike, 1973) is defined as 


—2 2 
AIC = F In(likelihood) + T x (number of parameters), (2.16) 


where the likelihood function is evaluated at the maximum -likelihood estimates 
and T is the sample size. For a Gaussian AR(£) model, AIC reduces to 


woe , 26 

AIC(€) = In(o7) + F 

where of is the maximum-likelihood estimate of o7, which is the variance of ar, 

and T is the sample size; see Eq. (1.18). The first term of the AIC in Eq. (2.16) 

measures the goodness of fit of the AR(£) model to the data, whereas the second 

term is called the penalty function of the criterion because it penalizes a candidate 

model by the number of parameters used. Different penalty functions result in 

different information criteria. 

Another commonly used criterion function is the Schwarz—Bayesian information 

criterion (BIC). For a Gaussian AR(£) model, the criterion is 


BIC(£) = In(6?) + - 
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The penalty for each parameter used is 2 for AIC and In(T) for BIC. Thus, com- 
pared with AIC, BIC tends to select a lower AR model when the sample size is 
moderate or large. 


Selection Rule 

To use AIC to select an AR model in practice, one computes AIC(¢) for £ = 
0,..., P, where p is a prespecified positive integer and selects the order k that has 
the minimum AIC value. The same rule applies to BIC. 

Table 2.1 also gives the AIC and BIC for p = 1,..., 12. The AIC values are 
close to each other with minimum —5.849 occurring at p = 9, suggesting that an 
AR(9) model is preferred by the criterion. The BIC, on the other hand, attains 
its minimum value —5.833 at p = 1 with —5.831 as a close second at p = 3. 
Thus, the BIC selects an AR(1) model for the value-weighted return series. This 
example shows that different approaches or criteria to order determination may 
result in different choices of p. There is no evidence to suggest that one approach 
outperforms the other in a real application. Substantive information of the problem 
under study and simplicity are two factors that also play an important role in 
choosing an AR model for a given time series. 

Again, consider the growth rate series of U.S. quarterly real GNP of 
Example 2.1. The AIC obtained from R also identifies an AR(3) model. Note that 
the AIC value of the ar command in R has been adjusted so that the minimum 
AIC is zero. 


> gnp=scan(file=’q-gnp4791.txt’) 
> ord=ar(gnp,method=’‘mle’’) 
> ordSaic 
[1] 27.847 2.742 1.603 0.000 0.323 2.243 
[7] 4.052 6.025 5.905 7.572 7.895 9.679 
> ordSorder 
Li] 3 


Parameter Estimation 
For a specified AR(p) model in Eq. (2.9), the conditional least-squares method, 


which starts with the (p + 1)th observation, is often used to estimate the parameters. 
Specifically, conditioning on the first p observations, we have 


ri = Got biti +: + Oprt-p + 4r, t=ptl,...,T, 


which is in the form of a multiple linear regression and can be estimated by the 
least-squares method. Denote the estimate of ¢; by ġ;. The fitted model is 


i= bo + birt ee a 


and the associated residual is 
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The series {â;} is called the residual series, from which we obtain 


T à 
a2 _ paar 
^ T-2p-1 


If the conditional-likelihood method is used, the estimates of ¢; remain unchanged, 
but the estimate of a? becomes TA = ô? x (T — 2p — 1)/(T — p). In some pack- 
ages, G? is defined as ô x (T — 2p — 1)/T. For illustration, consider an AR(3) 
model for the monthly simple returns of the value-weighted index in Table 2.1. 


The fitted model is 
r; = 0.0091 + 0.116r;,—ı — 0.019r;—2 — 0.104r;_3 + â;, ĉa = 0.054. 


The standard errors of the coefficients are 0.002, 0.032, 0.032, and 0.032, respec- 
tively. Except for the lag-2 coefficient, all parameters are statistically significant at 
the 1% level. 

For this example, the AR coefficients of the fitted model are small, indicating that 
the serial dependence of the series is weak, even though it is statistically significant 
at the 1% level. The significance of do of the entertained model implies that the 
expected mean return of the series is positive. In fact, ĝ = 0.0091/(1 — 0.116 + 
0.019 + 0.104) = 0.009, which is small but has an important long-term implication. 
It implies that the long-term return of the index can be substantial. Using the 
multiperiod simple return defined in Chapter 1, the average annual simple gross 
return is [[]72% (1 + R,)]!7/99® — 1 ~ 0.093. In other words, the monthly simple 
returns of the CRSP value-weighted index grew about 9.3% per annum from 1926 
to 2008, supporting the common belief that equity market performs well in the 
long term. A one-dollar investment at the beginning of 1926 would be worth about 
$1593 at the end of 2008. 


> vw=read.table(‘’m-ibm3dx.txt’,header=T) [,3] 
> tl=prod(vw+1) 

> tl 

[1] 1592.953 

> t1*(12/996)-1 

[1] 0.0929 


Model Checking 

A fitted model must be examined carefully to check for possible model inadequacy. 
If the model is adequate, then the residual series should behave as a white noise. 
The ACF and the Ljung—Box statistics in Eq. (2.3) of the residuals can be used to 
check the closeness of â, to a white noise. For an AR(p) model, the Ljung—Box 
statistic Q (m) follows asymptotically a chi-squared distribution with m — g degrees 
of freedom, where g denotes the number of AR coefficients used in the model. The 
adjustment in the degrees of freedom is made based on the number of constraints 
added to the residuals a, from fitting the AR(p) to an AR(O) model. If a fitted 
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model is found to be inadequate, it must be refined. For instance, if some of the 
estimated AR coefficients are not significantly different from zero, then the model 
should be simplified by trying to remove those insignificant parameters. If residual 
ACF shows additional serial correlations, then the model should be extended to 
take care of those correlations. 


Remark. Most time series packages do not adjust the degrees of freedom when 
applying the Ljung—Box statistics Q (m) to a residual series. This is understandable 
when m < g. 


Consider the residual series of the fitted AR(3) model for the monthly value- 
weighted simple returns. We have Q(12) = 16.35 with a p value 0.060 based on 
its asymptotic chi-squared distribution with 9 degrees of freedom. Thus, the null 
hypothesis of no residual serial correlation in the first 12 lags is barely not rejected 
at the 5% level. However, since the lag-2 AR coefficient is not significant at the 
5% level, one can refine the model as 


re = 0.0088 + 0.114r;—ı = 0.106r;—3 + at, Ôa = 0.0536, 


where all the estimates are now significant at the 1% level. The residual series 
gives Q(12) = 16.83 with a p value 0.078 (based on Xi): The model is adequate 
in modeling the dynamic linear dependence of the data. 


R Demonstration 
In the following R demonstration, % denotes an explanation: 


> vw=read.table(’m-ibm3dx2608.txt’,header=T) [,3] 
> m3=arima (vw, order=c(3,0,0)) 


> m3 
Call: 
arima(x = vw, order = c(3, 0, 0)) 
Coefficients: 
arl ar2 ar3 intercept 
0.1158 -0.0187 -0.1042 0.0089 
s.e. 0.0315 0.0317 0.0317 0.0017 


sigma*2 estimated as 0.002875: log likelihood=1500.86, 
aic=-2991.73 


> (1-.1158+.0187+.1042) *mean(vw) % Compute 
the intercept phi(0). 
[1] 0.00896761 
> sqrt(m3$sigma2) % Compute standard error of residuals 
[1] 0.0536189 


> Box.test (m3$Sresiduals, lag=12,type='’Ljung’ ) 
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Box-Ljung test 
data: m3Sresiduals % R uses 12 degrees of freedom 
X-squared = 16.3525, df = 12, p-value = 0.1756 


> pv=l-pchisg(16.35,9) % Compute p-value using 9 degrees 
of freedom 
> pv 

[1] 0.05992276 

% To fix the AR(2) coef to zero: 
> m3=arima (vw, order=c(3,0,0),fixed=c (NA, 0,NA,NA) ) 

% The subcommand ‘fixed’ is used to fix parameter values, 
where NA denotes estimation and 0 means fixing the 
parameter to 0. 

The ordering of the parameters can be found using m3S$coef. 
> m3 

Calls 

arima(x = vw, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA) ) 


oe 


ode 


Coefficients: 
arl ar2 ar3 intercept 
0.1136 0 -0.1063 0.0089 
s.e. 0.0313 0 0.0315 0.0017 


sigma*2 estimated as 0.002876: log likelihood=1500.69, 
aic=-2993.38 

> (1-.1136+.1063)*.0089 % Compute phi (0) 

[1] 0.00883503 

> sqrt(m3$sigma2) % Compute residual standard error 

[1] 0.05362832 


> Box.test (m3$residuals, lag=12,type='’Ljung’ ) 
Box-Ljung test 


data: m3$Sresiduals 
X-squared = 16.8276, df = 12, p-value = 0.1562 


> pv=l-pchisq(16.83,10) 
> pv 
[1] 0.0782113 


S-Plus Demonstration 
The following S-Plus output has been edited: 


> vw=read.table(‘’m-ibm3dx2608.txt’,header=T) [,3] 
> ar3=OLS (vw ar(3) ) 
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> summary (ar3) 
Call: 
OLS (formula = vw ar(3)) 


Residuals: 
Min 10 Median 30 Max 
-0.2863 -0.0263 0.0034 0.0297 0.3689 


Coefficients: 
Value Std. Error t value Pr(>|t|) 
(Intercept) 0.0091 0.0018 5.1653 0.0000 
lag1 0.1148 0.0316 3.6333 0.0003 
lag2 -0.0188 0.0318 -0.5894 0.5557 
lag3 -0.1043 0.0318 -3.2763 0.0011 


Regression Diagnostics: 
R-Squared 0.0246 

Adjusted R-Squared 0.0216 

Durbin-Watson Stat 1.9913 


Residual Diagnostics: 
Stat P-Value 
Jarque-Bera 1656.3928 0.0000 
Ljung-Box 50.1279 0.0087 


Residual standard error: 0.05375 on 989 degrees of freedom 
> autocorTest (ar3$residuals, lag=12) 


Test for Autocorrelation: Ljung-Box 
Null Hypothesis: no autocorrelation 


Test Statistics: 
Test Stat 16.5668 
p-value 0.1666 % S-Plus uses 12 degrees of freedom 


Dist. under Null: chi-square with 12 degrees of freedom 
Total Observ.: 993 

> 1-pchisq(16.57,9) % Compute p-value with 9 degrees 
of freedom 

[1] 0.05589128 


2.4.3 Goodness of Fit 


A commonly used statistic to measure goodness of fit of a stationary model is the 
R square (R?) defined as 


Rai residual sum of squares 
7 total sum of squares 
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For a stationary AR(p) time series model with T observations {r;|f = 1,..., T}, 
the measure becomes 


T x 
> i=při â? 


T a 
2 (r! = r)? 


where r = ae r:/(T — p). It is easy to show that 0 < R? < 1. Typically, a 
larger R? indicates that the model provides a closer fit to the data. However, this is 
only true for a stationary time series. For the unit-root nonstationary series discussed 
later in this chapter, R? of an AR(1) fit converges to one when the sample size 
increases to infinity, regardless of the true underlying model of 7;. 

For a given data set, it is well known that R? is a nondecreasing function of 
the number of parameters used. To overcome this weakness, an adjusted R? is 
proposed, which is defined as 


R?=1 


variance of residuals 


Adj — R° =1-— : 
variance of r; 
sjak 
7 G2” 


where 67 is the sample variance of r;. This new measure takes into account the 
number of parameters used in the fitted model. However, it is no longer between 
O and 1. 


2.4.4 Forecasting 


Forecasting is an important application of time series analysis. For the AR(p) 
model in Eq. (2.9), suppose that we are at the time index h and are interested 
in forecasting rn+e, where £ > 1. The time index h is called the forecast origin 
and the positive integer £ is the forecast horizon. Let f, (£) be the forecast of rp+e 
using the minimum squared error loss function. In other words, the forecast 7; (€) 
is chosen such that 


Eflrnse — fr (OI Fi} < min EL(ri+e — g) | Fal, 


where g is a function of the information available at time h (inclusive), that is, 
a function of Fp. We referred to fa (£) as the £-step ahead forecast of r, at the 
forecast origin A. Let F, be the collection of information available at the forecast 
origin h. 


1-Step-Ahead Forecast 
From the AR(p) model, we have 


rhot = Po + bith + +++ + bprnqi-p + an+1- 


SIMPLE AR MODELS 55 


Under the minimum squared error loss function, the point forecast of rp41 given 
Fp is the conditional expectation 


P 


PnC) = ECrnsilFn) = Qo + È ifht, 


i=1 


and the associated forecast error is 


en(1) = raga — fn (1) = ang. 


Consequently, the variance of the 1-step-ahead forecast error is Var[e,(1)] = 
Var(ay+1) = a, If a, is normally distributed, then a 95% 1-step-ahead interval 
forecast of r41 is 7,(1) + 1.96 x og. For the linear model in Eq. (2.4), a;41 is also 
the 1-step-ahead forecast error at the forecast origin t. In the econometric literature, 
a;41 is referred to as the shock to the series at time t + 1. 

In practice, estimated parameters are often used to compute point and interval 
forecasts. This results in a conditional forecast because such a forecast does not 
take into consideration the uncertainty in the parameter estimates. In theory, one 
can consider parameter uncertainty in forecasting, but it is much more involved. A 
natural way to consider parameter and model uncertainty in forecasting is Bayesian 
forecasting with Markov chan Monte Carlo (MCMC) methods. See Chapter 12 for 
further discussion. For simplicity, we assume that the model is given in this chapter. 
When the sample size used in estimation is sufficiently large, then the conditional 
forecast is close to the unconditional one. 


2-Step-Ahead Forecast 
Next consider the forecast of r}+2 at the forecast origin A. From the AR(p) model, 
we have 


rhz2 = Qo + hing +++ + Opln42-p + An42- 
Taking conditional expectation, we have 
F,(2) = E(rn42| Fn) = b0 + bifn(L) + born +++: + bprns2—p 
and the associated forecast error 
€n(2) = rho — Pn (2) = iini — Pn (1)] + an2 = Gayo + brani. 
The variance of the forecast error is Var[e,(2)] = (1 + poz. Interval forecasts 
of r;42 can be computed in the same way as those for rp+1. It is interesting to 


see that Var[e,(2)] > Var[e,(1)], meaning that as the forecast horizon increases 
the uncertainty in forecast also increases. This is in agreement with common sense 
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that we are more uncertain about 7,42 than r41 at the time index h for a linear 
time series. 


Multistep-Ahead Forecast 
In general, we have 


rnve = Qo + Oirnge-1 +++ + Oprh+t-p + ante. 


The ¢-step-ahead forecast based on the minimum squared error loss function is the 
conditional expectation of r+} given Fp, which can be obtained as 


p 
Fa (£) = bo + X ial —i), 


i=1 


where it is understood that a(i) = rn4; if i < 0. This forecast can be computed 
recursively using forecasts 7,(i) fori = 1,...,€— 1. The ¢-step-ahead forecast 
error is e,(£) = rne — fn (£). It can be shown that for a stationary AR(p) model, 
r,(£) converges to E(r;) as £ —> oo, meaning that for such a series long-term point 
forecast approaches its unconditional mean. This property is referred to as the mean 
reversion in the finance literature. For an AR(1) model, the speed of mean reversion 
is measured by the half-life defined as £ = In(0.5)/In(|¢|). The variance of the 
forecast error then approaches the unconditional variance of r;. Note that for an 
AR(1) model in (2.8), let x, = r; — E(r;) be the mean-adjusted series. It is easy to 
see that the ¢-step-ahead forecast of x;4¢ at the forecast orign h is £a (£) = Pf Xn. 
The half-life is the forecast horizon such that £, (£) = ixn. That is, ot = 5. Thus, 
£ = In(0.5)/In(¢1)). 

Table 2.2 contains the 1-step- to 12-step ahead forecasts and the standard errors 
of the associated forecast errors at the forecast origin 984 for the monthly simple 
return of the value-weighted index using an AR(3) model that was reestimated 
using the first 984 observations. The fitted model is 


r, = 0.0098 + 0.10247, — 0.02017,» — 0.1090r,_3 + a, 


where G, = 0.054. The actual returns of 2008 are also given in Table 2.2. Because 
of the weak serial dependence in the series, the forecasts and standard deviations 
of forecast errors converge to the sample mean and standard deviation of the data 
quickly. For the first 984 observations, the sample mean and standard error are 
0.0095 and 0.0540, respectively. 

Figure 2.7 shows the corresponding out-of-sample prediction plot for the 
monthly simple return series of the value-weighted index. The forecast origin 
t = 984 corresponds to December 2007. The prediction plot includes the two 
standard error limits of the forecasts and the actual observed returns for 2008. 
The forecasts and actual returns are marked by © and e, respectively. From the 
plot, except for the return of October 2008, all actual returns are within the 95% 
prediction intervals. 
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TABLE 2.2 Multistep Ahead Forecasts of an AR(3) Model for Monthly Simple 
Returns of CRSP Value-Weighted Index 


Step 1 2 3 4 5 6 

Forecast 0.0076 0.0161 0.0118 0.0099 0.0089 0.0093 
Std. Error 0.0534 0.0537 0.0537 0.0540 0.0540 0.0540 
Actual —0.0623 —0.0220 —0.0105 0.0511 0.0238 —0.0786 
Step 7 8 9 10 11 12 

Forecast 0.0095 0.0097 0.0096 0.0096 0.0096 0.0096 
Std. Error 0.0540 0.0540 0.0540 0.0540 0.0540 0.0540 
Actual —0.0132 0.0110 —0.0981 —0.1847 —0.0852 0.0215 


“The forecast origin is h = 984. 


Simple return 
0.0 


2007.5 2008.0 2008.5 2009.0 
Time 
Figure 2.7 Plot of 1- to 12-step-ahead out-of-sample forecasts for monthly simple returns of CRSP 


value-weighted index. Forecast origin is t = 984, which is December 2007. Forecasts are denoted by 
“o” and actual observations by “e”. Two dashed lines denote two standard error limits of the forecasts. 


2.5 SIMPLE MA MODELS 


We now turn to another class of simple models that are also useful in model- 
ing return series in finance. These models are the moving-average (MA) models. 
As is shown in Chapter 5, the bid—ask bounce in stock trading may introduce 
an MA(1) structure in a return series. There are several ways to introduce MA 
models. One approach is to treat the model as a simple extension of white noise 
series. Another approach is to treat the model as an infinite-order AR model with 
some parameter constraints. We adopt the second approach. 
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There is no particular reason, but simplicity, to assume a priori that the order 
of an AR model is finite. We may entertain, at least in theory, an AR model with 
infinite order as 


rr = do + birt—1 + bari-2 + +++ +a. 


However, such an AR model is not realistic because it has infinite many parameters. 
One way to make the model practical is to assume that the coefficients ¢;’s satisfy 
some constraints so that they are determined by a finite number of parameters. A 
special case of this idea is 


ri = Qo — Ori — OFre-2 — OFT1-3 — ++ +a, (2.17) 


where the coefficients depend on a single parameter 6) via ¢; = —6) for i > 1. For 
the model in Eq. (2.17) to be stationary, 6; must be less than 1 in absolute value; 
otherwise, oi and the series will explode. Because |01| < 1, we have oi — 0 as 
i — oo. Thus, the contribution of r;_; to r; decays exponentially as i increases. 
This is reasonable as the dependence of a stationary series r; on its lagged value 
r;-i, if any, should decay over time. 

The model in Eq. (2.17) can be rewritten in a rather compact form. To see this, 
rewrite the model as 


ri + Oiri + 0ft +++ = Go + ar. (2.18) 
The model for r,;_; is then 


11-1 + O12 + O73 +--+ = Go + arı- (2.19) 


Multiplying Eq. (2.19) by 6; and subtracting the result from Eq. (2.18), we obtain 
ri = dod — 01) + a; — O1a;-1, 


which says that except for the constant term r; is a weighted average of shocks a; 
and a;_,. Therefore, the model is called an MA model of order 1 or MA(1) model 
for short. The general form of an MA(1) model is 


r; = Co + as —Oq_-) or r; = co + (1 — 0 B)a;, (2.20) 


where co is a constant and {a+} is a white noise series. Similarly, an MA(2) model 
is in the form 


Ti = Co + ar — Oiar- — 0244-2, (2.21) 
and an MA(q) model is 
ry = co F d; = Oidi- = +++ — gjllig; (2.22) 


or r; = co + (1 — 01B —--- — 67B%)a;, where q > 0. 
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2.5.1 Properties of MA Models 


Again, we focus on the simple MA(1) and MA(2) models. The results of MA(q) 
models can easily be obtained by the same techniques. 


Stationarity 

Moving-average models are always weakly stationary because they are finite linear 
combinations of a white noise sequence for which the first two moments are time 
invariant. For example, consider the MA(1) model in Eq. (2.20). Taking expectation 
of the model, we have 


E(r;) = co, 
which is time invariant. Taking the variance of Eq. (2.20), we have 
Var(r;) = 0, +070? = (1+ O)oz, 


where we use the fact that a; and a;_; are uncorrelated. Again, Var(r;) is time 
invariant. The prior discussion applies to general MA(q) models, and we obtain 
two general properties. First, the constant term of an MA model is the mean of the 
series [i.e., E(r;) = co]. Second, the variance of an MA(q) model is 


Var(r;) = (1+ 07 +03 +--+ + O20. 


Autocorrelation Function 
Assume for simplicity that co = 0 for an MA(1) model. Multiplying the model by 
r;—~¢, we have 


rr—el't = Fitar — Ortel]. 
Taking expectation, we obtain 
yı = —O\0° and ve = 0, for €>1. 
Using the prior result and the fact that Var(r;) = (1 + O?)a2, we have 


— -9 
1+0? 


= L, pı pe = 0, for £>1. 

Thus, for an MA(1) model, the lag-1 ACF is not zero, but all higher order ACFs 
are zero. In other words, the ACF of an MA(1) model cuts off at lag 1. For the 
MA(2) model in Eq. (2.21), the autocorrelation coefficients are 


= + 0102 —02 


= —____, =m =0, for ¢>2. 
i++ PTI ™ 


Pl 


Here the ACF cuts off at lag 2. This property generalizes to other MA models. For 
an MA(q) model, the lag-g ACF is not zero, but pg = 0 for £ >q. Consequently, 
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an MA(q) series is only linearly related to its first g-lagged values and hence is a 
“finite-memory” model. 


Invertibility 


Rewriting a zero-mean MA(1) model as a; = r; + 0;a;~-1, one can use repeated 
substitutions to obtain 


ar =r; + Or: 1 + OF r;-2 + Or He. 


This equation expresses the current shock a; as a linear combination of the present 
and past returns. Intuitively, H should go to zero as j increases because the remote 
return r;_; should have very little impact on the current shock, if any. Consequently, 
for an MA(1) model to be plausible, we require |0;| < 1. Such an MA(1) model 
is said to be invertible. If |01| = 1, then the MA(1) model is noninvertible. See 
Section 2.6.5 for further discussion on invertibility. 


2.5.2 Identifying MA Order 


The ACF is useful in identifying the order of an MA model. For a time series r; 
with ACF pg, if pg #0, but pe = 0 for £ >q, then r; follows an MA(q) model. 
Figure 2.8 shows the time plot of monthly simple returns of the CRSP equal- 
weighted index from January 1926 to December 2008 and the sample ACF of the 
series. The two dashed lines shown on the ACF plot denote the two standard error 
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Figure 2.8 Time plot and sample autocorrelation function of monthly simple returns of CRSP equal- 
weighted index from January 1926 to December 2008. 
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limits. It is seen that the series has significant ACF at lags 1, 3, and 9. There are 
some marginally significant ACF at higher lags, but we do not consider them here. 
Based on the sample ACF, the following MA(9) model 


rt = Co + a; — OA;~1 — 0341-3 — A949 


is identified for the series. Note that, unlike the sample PACF, sample ACF provides 
information on the nonzero MA lags of the model. 


2.5.3 Estimation 


Maximum-likelihood estimation is commonly used to estimate MA models. There 
are two approaches for evaluating the likelihood function of an MA model. The 
first approach assumes that the initial shocks (i.e., a; for t < 0) are zero. As such, 
the shocks needed in likelihood function calculation are obtained recursively from 
the model, starting with aj = rı — co and az = r2 — co + Oa). This approach is 
referred to as the conditional-likelihood method and the resulting estimates the 
conditional maximum-likelihood estimates. The second approach treats the initial 
shocks az, t < 0, as additional parameters of the model and estimate them jointly 
with other parameters. This approach is referred to as the exact-likelihood method. 
The exact-likelihood estimates are preferred over the conditional ones, especially 
when the MA model is close to being noninvertible. The exact method, however, 
requires more intensive computation. If the sample size is large, then the two types 
of maximum-likelihood estimates are close to each other. For details of conditional- 
and exact-likelihood estimates of MA models, readers are referred to Box, Jenkins, 
and Reinsel (1994) or Chapter 8. 

For illustration, consider the monthly simple return series of the CRSP equal- 
weighted index and the specified MA(9) model. The conditional maximum- 
likelihood method produces the fitted model 


r; = 0.012 + a + 0.189a;—1 — 0.121a;-3 + 0.122a;-9, Gq = 0.0714, (2.23) 


where standard errors of the coefficient estimates are 0.003, 0.031, 0.031, and 
0.031, respectively. The Ljung—Box statistics of the residuals give Q(12) = 17.5 
with a p value 0.041, which is based on an asymptotic chi-squared distribution 
with 9 degrees of freedom. The model needs some refinements in modeling the 
linear dynamic dependence of the data. The p value would be 0.132 if 12 degrees 
of freedom are used. The exact maximum-likelihood method produces the fitted 
model 


r; = 0.012 + a, + 0.191la,_; — 0.120a,;_3 + 0.123a;_9, 6, = 0.0714, (2.24) 
where standard errors of the estimates are 0.003, 0.031, 0.031, and 0.031, respec- 


tively. The Ljung—Box statistics of the residuals give Q(12) = 17.6. The corre- 
sponding p values are 0.040 and 0.128, respectively, when the degrees of freedom 
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are 9 and 12. Again, this fitted model is only marginally adequate. Comparing 
models (2.23) and (2.24), we see that for this particular instance, the difference 
between the conditional- and exact-likelihood methods is negligible. 


2.5.4 Forecasting Using MA Models 


Forecasts of an MA model can easily be obtained. Because the model has finite 
memory, its point forecasts go to the mean of the series quickly. To see this, assume 
that the forecast origin is h and let Fp denote the information available at time A. 
For the 1|-step-ahead forecast of an MA(1) process, the model says 


Thol = Co + Anti — Oian. 
Taking the conditional expectation, we have 


Fn) = E (rn41l Fh) = co — 01an, 
en(1) = rai — fn (1) = anys. 


The variance of the l-step-ahead forecast error is Var[e;(1)] = ae. In practice, 
the quantity ap can be obtained in several ways. For instance, assume that ay = 0, 
then aj = rı — co, and we can compute a; for 2 < t < h recursively by using a; = 
rı — Co + O1a;_1. Alternatively, it can be computed by using the AR representation 
of the MA(1) model; see Section 2.6.5. Of course, a; is the residual series of a 
fitted MA(1) model. Thus, ap is readily available from the estimation. 

For the 2-step-ahead forecast from the equation 


Th42 = Co + an42 — Man+1, 
we have 


Pa(2) = E(rn42|Fh) = co, 


€n(2) = rn42 — Ph (2) = an42 — 01h41- 


The variance of the forecast error is Var[e,(2)] = (1 + 6?)o2, which is the variance 
of the model and is greater than or equal to that of the 1-step-ahead forecast error. 
The prior result shows that for an MA(1) model the 2-step-ahead forecast of the 
series is simply the unconditional mean of the model. This is true for any forecast 
origin h. More generally, (£) = co for £ > 2. In summary, for an MA(1) model, 
the 1-step-ahead point forecast at the forecast origin h is co — 01an and the multistep 
ahead forecasts are co, which is the unconditional mean of the model. If we plot 
the forecasts (£) versus £, we see that the forecasts form a horizontal line after 
one step. Thus, for MA(1) models, mean reverting only takes one time period. 
Similarly, for an MA(2) model, we have 


Thee = Co + Anse — O1dn+e—1 — O24n+40-2, 
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Pr(L) = co — Aian — 24-1, 
7,(2) = co — Qan, 


PaO = co, for €>2. 


Thus, the multistep-ahead forecasts of an MA(2) model go to the mean of the series 
after two steps. The variances of forecast errors go to the variance of the series 
after two steps. In general, for an MA(q) model, multistep-ahead forecasts go to 
the mean after the first q steps. 

Table 2.3 gives some out-of-sample forecasts of an MA(9) model in the form of 
Eq. (2.24) for the monthly simple returns of the equal-weighted index at the forecast 
origin h = 986 (February 2008). The model parameters are reestimated using 
the first 986 observations. The sample mean and standard error of the estimation 
subsample are 0.0128 and 0.0736, respectively. As expected, the table shows that 
(a) the 10-step-ahead forecast is the sample mean, and (b) the standard deviations 
of the forecast errors converge to the standard deviation of the series as the forecast 
horizon increases. In this particular case, the point forecasts deviate substantially 
from the observed returns because of the worldwide financial crisis caused by the 
subprime mortgage problem and the collapse of Lehman Brothers. 


Summary 
A brief summary of AR and MA models is in order. We have discussed the fol- 
lowing properties: 


e For MA models, ACF is useful in specifying the order because ACF cuts off 
at lag q for an MA(Q) series. 

e For AR models, PACF is useful in order determination because PACF cuts 
off at lag p for an AR(p) process. 

e An MA series is always stationary, but for an AR series to be stationary, all 
of its characteristic roots must be less than 1 in modulus. 


TABLE 2.3 Out-of-Sample Forecast Performance of an MA(9) Model for Monthly 
Simple Returns of CRSP Equal-Weighted Index“ 


Step 1 2 3 4 5 

Forecast 0.0043 0.0136 0.0150 0.0144 0.0120 
Std. Error 0.0712 0.0724 0.0729 0.0729 0.0729 
Actual —0.0260 0.0312 0.0322 —0.0871 —0.0010 
Step 6 7 8 9 10 

Forecast 0.0019 0.0122 0.0056 0.0085 0.0128 
Std. Error 0.0729 0.0729 0.0729 0.0729 0.0734 
Actual 0.0141 —0.1209 —0.2060 —0.1366 0.0431 


“The forecast origin is February 2008 With h = 986. The model is estimated by the exact maximum- 
likelihood method. 
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e For a stationary series, the multistep-ahead forecasts converge to the mean of 
the series, and the variances of forecast errors converge to the variance of the 
series as the forecast horizon increases. 


2.6 SIMPLE ARMA MODELS 


In some applications, the AR or MA models discussed in the previous sections 
become cumbersome because one may need a high-order model with many param- 
eters to adequately describe the dynamic structure of the data. To overcome this 
difficulty, the autoregressive moving-average (ARMA) models are introduced; see 
Box, Jenkins, and Reinsel (1994). Basically, an ARMA model combines the ideas 
of AR and MA models into a compact form so that the number of parameters used 
is kept small, achieving parsimony in parameterization. For the return series in 
finance, the chance of using ARMA models is low. However, the concept of ARMA 
models is highly relevant in volatility modeling. As a matter of fact, the generalized 
autoregressive conditional heteroscedastic (GARCH) model can be regarded as an 
ARMA model, albeit nonstandard, for the a? series; see Chapter 3 for details. In 
this section, we study the simplest ARMA(1,1) model. 
A time series r; follows an ARMA(1,1) model if it satisfies 


ri — Oi1-1 = Qo + ar — A1a;-1, (2.25) 


where {a;} is a white noise series. The left-hand side of the Eq. (2.25) is the AR 
component of the model and the right-hand side gives the MA component. The 
constant term is ġo. For this model to be meaningful, we need ¢; Æ 6; otherwise, 
there is a cancellation in the equation and the process reduces to a white noise series. 


2.6.1 Properties of ARMA(1,1) Models 


Properties of ARMA(1,1) models are generalizations of those of AR(1) models 
with some minor modifications to handle the impact of the MA(1) component. We 
start with the stationarity condition. Taking expectation of Eq. (2.25), we have 


E(ri) — Q1 E(r:-1) = 60 + E (ar) — 01 E (at-1). 
Because E(a;) = 0 for all i, the mean of r; is 
go 
1-4 


provided that the series is weakly stationary. This result is exactly the same as that 
of the AR(1) model in Eq. (2.8). 

Next, assuming for simplicity that 69 = 0, we consider the autocovariance func- 
tion of r,. First, multiplying the model by a, and taking expectation, we have 


Er) =h= 


E(r,a;) = E(a?) — 6, E(a;q)-1) = E(a?) = ož. (2.26) 
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Rewriting the model as 
ri = Piti-1 + ar — O1a;-1 
and taking the variance of the prior equation, we have 
Var(r;) = $7 Var(r, 1) + 02 + 6702 — 2616) E (r141). 


Here we make use of the fact that r,_; and a; are uncorrelated. Using Eq. (2.26), 
we obtain 


Var(r;) — $? Var(r;_1) = (1 — 2161 + 67)02. 


Therefore, if the series r; is weakly stationary, then Var(r;) = Var(r;-1) and we 
have 


Var(r,) = (1 — 2616) + Doa 
1-¢ 
Because the variance is positive, we need o? < 1 (i.e., |di| < 1). Again, this is 
precisely the same stationarity condition as that of the AR(1) model. 
To obtain the autocovariance function of 7;, we assume ġo = 0 and multiply the 
model in Eq. (2.25) by r;—¢ to obtain 


rie — Olt—11—e = Atri e — 9, a;—11 te. 


For £ = 1, taking expectation and using Eq. (2.26) for t — 1, we have 
Vi — 1v0 = 0107, 


where ye = Cov(7;, 7;~¢). This result is different from that of the AR(1) case for 
which yı — ġıyo = 0. However, for £ = 2 and taking expectation, we have 


y2 — ıyı =0, 
which is identical to that of the AR(1) case. In fact, the same technique yields 
ye — þıye-1 =0, for £>l. (2.27) 
In terms of ACF, the previous results show that for a stationary ARMA(1,1) model 
0o? 


a 


pı =¢1— , Pe=ipe-1, for €>1. 
Yo 
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Thus, the ACF of an ARMA(1,1) model behaves very much like that of an AR(1) 
model except that the exponential decay starts with lag 2. Consequently, the ACF 
of an ARMA(1,1) model does not cut off at any finite lag. 

Turning to PACF, one can show that the PACF of an ARMA(1,1) model does 
not cut off at any finite lag either. It behaves very much like that of an MA(1) 
model except that the exponential decay starts with lag 2 instead of lag 1. 

In summary, the stationarity condition of an ARMA(1,1) model is the same as 
that of an AR(1) model, and the ACF of an ARMA(I,1) exhibits a similar pattern 
like that of an AR(1) model except that the pattern starts at lag 2. 


2.6.2 General ARMA Models 
A general ARMA(p, q) model is in the form 


p q 
tr= ot > Qifti +a — D> biai, 
i=l i=l 


where {a;} is a white noise series and p and q are nonnegative integers. The AR 
and MA models are special cases of the ARMA(p, q) model. Using the back-shift 
operator, the model can be written as 


(1 — @B—---— pB? )ri = po + (1 — 01B — +++ — 0g BPa. (2.28) 
The polynomial 1 — ¢,;B —---— pB” is the AR polynomial of the model. Sim- 
ilarly, 1 — 6; B —--- — 0q B4 is the MA polynomial. We require that there are no 


common factors between the AR and MA polynomials; otherwise the order (p, q) 
of the model can be reduced. Like a pure AR model, the AR polynomial intro- 
duces the characteristic equation of an ARMA model. If all of the solutions of the 
characteristic equation are less than 1 in absolute value, then the ARMA model is 
weakly stationary. In this case, the unconditional mean of the model is E(r;) = 


$o/ — o1 =+: — $p). 


2.6.3 Identifying ARMA Models 


The ACF and PACF are not informative in determining the order of an ARMA 
model. Tsay and Tiao (1984) propose a new approach that uses the extended auto- 
correlation function (EACF) to specify the order of an ARMA process. The basic 
idea of EACF is relatively simple. If we can obtain a consistent estimate of the AR 
component of an ARMA model, then we can derive the MA component. From the 
derived MA series, we can use ACF to identify the order of the MA component. 
The derivation of EACF is relatively involved; see Tsay and Tiao (1984) for 
details. Yet the function is easy to use. The output of EACF is a two-way table, 
where the rows correspond to AR order p and the columns to MA order g. The 
theoretical version of EACF for an ARMA(I,1) model is given in Table 2.4. The 
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TABLE 2.4 Theoretical EACF Table for an ARMA(1,1) Model, Where X Denotes 
Nonzero, O Denotes Zero, and * Denotes Either Zero or Nonzero’ 


MA 

AR 0 1 2 3 4 5 6 7 
0 x x x x x x x x 
1 x o O O o o o o 
2 * x o o O o o o 
3 * * x o o o o O 
4 * * * x o o o O 
5 * * * * X o o o 


“This latter category does not play any role in identifying the order (1,1). 


key feature of the table is that it contains a triangle of O with the upper left vertex 
located at the order (1,1). This is the characteristic we use to identify the order of 
an ARMA process. In general, for an ARMA(p, q) model, the triangle of O will 
have its upper left vertex at the (p, q) position. 

For illustration, consider the monthly log stock returns of the 3M Company from 
February 1946 to December 2008. There are 755 observations. The return series 
and its sample ACF are shown in Figure 2.9. The ACF indicates that there are 
no significant serial correlations in the data at the 1% level. Table 2.5 shows the 
sample EACF and a corresponding simplified table for the series. The simplified 
table is constructed by using the following notation: 


1. X denotes that the absolute value of the corresponding EACF is greater than 
or equal to 2/./T, which is twice of the asymptotic standard error of the 
EACF. 


2. O denotes that the corresponding EACF is less than 2/./T in modulus. 


The simplified table exhibits a triangular pattern of O with its upper left vertex 
at the order (p, q) = (0, 0). A few exceptions of X appear when q = 2, 5, 9, and 
11. However, the EACF table shows that the values of sample ACF corresponding 
to those X are around 0.08 or 0.09. These ACFs are only slightly greater than 
2/ /755 = 0.073. Indeed, if 1% critical value is used, those X would become O in 
the simplified EACF table. Consequently, the EACF suggests that the monthly log 
returns of 3M stock follow an ARMA(0,0) model (i.e., a white noise series). This 
is in agreement with the result suggested by the sample ACF in Figure 2.9. 

The information criteria discussed earlier can also be used to select ARMA(p, q) 
models. Typically, for some prespecified positive integers P and Q, one computes 
AIC (or BIC) for ARMA(p, g) models, where 0 < p < P and 0 <q < Q, and 
selects the model that gives the minimum AIC (or BIC). This approach requires 
maximum-likelihood estimation of many models and in some cases may encounter 
the difficulty of overfitting in estimation. 
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Figure 2.9 Time plot and sample autocorrelation function of monthly log stock returns of 3M Company 
from February 1946 to December 2008. 


Once an ARMA(p, g) model is specified, its parameters can be estimated by 
either the conditional or exact-likelihood method. In addition, the Ljung—Box statis- 
tics of the residuals can be used to check the adequacy of a fitted model. If the 
model is correctly specified, then Q(m) follows asymptotically a chi-squared dis- 
tribution with m — g degrees of freedom, where g denotes the number of AR or 
MA coefficients fitted in the model. 


2.6.4 Forecasting Using an ARMA Model 


Like the behavior of ACF, forecasts of an ARMA(p, q) model have similar char- 
acteristics as those of an AR(p) model after adjusting for the impacts of the MA 
component on the lower horizon forecasts. Denote the forecast origin by h and 
the available information by Fp. The 1-step-ahead forecast of 7,4; can be easily 
obtained from the model as 


p q 
7x1) = E(rnsilFn) = Qo + XO birns1-i = X bianyi, 


i=l i=l 


and the associated forecast error is e,(1) = rn+41 — Pn (1) = an+1. The variance of 
l-step-ahead forecast error is Var[e;,(1)] = ee. For the ¢-step-ahead forecast, we 
have 


P q 
Fn(€) = Enel Fh) = po + X gih — i) — Y Oian — i), 


i=1 i=l 


SIMPLE ARMA MODELS 69 


TABLE 2.5 Sample Extended Autocorrelation Function and a Simplified Table for 
the Monthly Log Returns of 3M Stock from February 1946 to December 2008 
Sample Extended Autocorrelation Function 


MA Order: q 
p 0 1 2 3 4 3 6 7 8 9 10 11 12 


0.06 —0.04 —0.08 —0.00 0.02 0.08 0.01 0.01 —0.03 —0.08 0.05 0.09 —0.01 
—0.47 0.01 —0.07 —0.02 0.00 0.08 —0.03 0.00 —0.01 —0.07 0.04 0.09 —0.02 
—0.38 —0.35 —0.07 0.02 —0.01 0.08 0.03 0.01 0.00 —0.03 0.02 0.04 0.04 
—0.18 0.14 0.38 —0.02 0.00 0.04 —0.02 0.02 —0.00—0.03 0.02 0.01 0.04 

0.42 0.03 0.45 —0.01 0.00 0.00—0.01 0.03 0.01 0.00 0.02 —0.00 0.01 
—0.11 0.21 0.45 0.01 0.20 —0.01 —0.00 0.04 —0.01 —0.01 0.03 0.01 0.03 

—0.21 —0.25 0.24 0.31 0.17 —0.04—0.00 0.04 —0.01 —0.03 0.01 0.01 0.04 


DANnNkKWNeF CO 


Simplified EACF Table 


MA Order: q 

p 0 1 2 3 4 5 6 7 8 9 10 11 12 
0 O O X O (0) X (0) O O X O X O 
1 X O O O O X (0) O O (0) O X (0) 
2 X X (0) O (0) X (0) O (0) (0) O O (0) 
3 X X X O O O O O O (0) O O (0) 
4 X (0) X O (0) (0) (0) O O (0) O O O 
5 X X X O X O (0) O O O O O O 
6 X X X X X (0) (0) O O (0) O O O 


where it is understood that 7,(€—i) =rpse_; if £— i <0 and a,(€ —i) =0 if 
£ — i >Q and a;,(€ —i) = an4e—i if £ —i < 0. Thus, the multistep-ahead forecasts 
of an ARMA model can be computed recursively. The associated forecast error is 


en (£) = rnoe — Pn (0), 


which can be computed easily via a formula to be given in Eq. (2.34). 


2.6.5 Three Model Representations for an ARMA Model 


In this section, we briefly discuss three model representations for a stationary 
ARMA(p, q) model. The three representations serve three different purposes. 
Knowing these representations can lead to a better understanding of the model. The 
first representation is the ARMA(p, q) model in Eq. (2.28). This representation 
is compact and useful in parameter estimation. It is also useful in computing 
recursively multistep-ahead forecasts of r;; see the discussion in the last section. 
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For the other two representations, we use long division of two polynomials. 
Given two polynomials ¢(B) = 1 — )7?_, iB! and 6(B) = 1 — X`}; 6B", we 
can obtain, by long division, that 


6(B) 


—. 2 ... = 
p(B) =1+ WiB F yB F = w(B) (2.29) 
and 
B 
ne ee ee ee (2.30) 


For instance, if (B) = 1 — ġı B and 6(B) = 1 — 8; B, then 


1- 6,B 2 2 A 
Hae eea ee e +4 ($i — 01)B? + 
1 — ġıB 
z(B) = 7 oe! ($1 — 01) B — 01 ($1 — 01) B? — OF ($1 — 01) BP 


From the definition, w(B)z(B) = 1. Making use of the fact that Bc = c for any 
constant (because the value of a constant is time invariant), we have 


go po po po 
—_—- = and — FS 
GU) Le Qp— m0 gd) 1l-gi---:— op 


AR Representation 
Using the result of long division in Eq. (2.30), the ARMA(p, q) model can be 
written as 


ae ee ee ee eee E T E eee ee (2.31) 
1-6 —---— 0, 


This representation shows the dependence of the current return r; on the past 
returns r;_;, where i > 0. The coefficients {7;} are referred to as the m weights of 
an ARMA model. To show that the contribution of the lagged value r,_; to r; is 
diminishing as i increases, the 7; coefficient should decay to zero as i increases. An 
ARMA(p, q) model that has this property is said to be invertible. For a pure AR 
model, 0(B) = 1 so that z (B) = (B), which is a finite-degree polynomial. Thus, 
x; = 0 for i > p, and the model is invertible. For other ARMA models, a sufficient 
condition for invertibility is that all the zeros of the polynomial 6(B) are greater 
than unity in modulus. For example, consider the MA(1) model r, = (1 — 6; B)a;. 
The zero of the first-order polynomial 1 — 6, B is B = 1/6,. Therefore, an MA(1) 
model is invertible if |1/0;| > 1. This is equivalent to |@;| < 1. 

From the AR representation in Eq. (2.31), an invertible ARMA(p, q) series r; 
is a linear combination of the current shock a; and a weighted average of the past 
values. The weights decay exponentially for more remote past values. 


UNIT-ROOT NONSTATIONARITY 71 


MA Representation 
Again, using the result of long division in Eq. (2.29), an ARMA(p, q) model can 
also be written as 


r, = u + ar + Piar + par +--+: = u + wW(B)a, (2.32) 


where u = E (ri) = ġo/(1 — $1 —---— bp). This representation shows explicitly 
the impact of the past shock a;_; (i > 0) on the current return r;. The coefficients 
{wi} are referred to as the impulse response function of the ARMA model. For 
a weakly stationary series, the y; coefficients decay exponentially as i increases. 
This is understandable as the effect of shock a;_; on the return r; should diminish 
over time. Thus, for a stationary ARMA model, the shock a;_; does not have 
a permanent impact on the series. If ¢9 #0, then the MA representation has a 
constant term, which is the mean of r, [i.e., go/(1 — ¢; —--- — bp). 

The MA representation in Eq. (2.32) is also useful in computing the variance 
of a forecast error. At the forecast origin h, we have the shocks ap, dp_1,.... 
Therefore, the ¢-step-ahead point forecast is 


Fh (£) = u + wean + Weyian-1 +++, (2.33) 
and the associated forecast error is 
en(£) = ange + Widnge-1 ++: + We-14n4qi- 
Consequently, the variance of ¢-step-ahead forecast error is 
Varlen()] = 1+ y? +--+ Vio, (2.34) 


which, as expected, is a nondecreasing function of the forecast horizon £. 

Finally, the MA representation in Eq. (2.32) provides a simple proof of mean 
reversion of a stationary time series. The stationarity implies that y; approaches 
zero as i —> oo. Therefore, by Eq. (2.33), we have 7;,(€) > u as £ > oo. Because 
r,(€) is the conditional expectation of rji¢ at the forecast origin h, the result 
says that in the long term the return series is expected to approach its mean, 
that is, the series is mean reverting. Furthermore, using the MA representation in 
Eq. (2.32), we have Var(r;) = (1 + Da W?)o2. Consequently, by Eq. (2.34), we 
have Var[e;,(€)] — Var(r;) as £ — oo. The speed by which 7,(€) approaches ju 
determines the speed of mean reverting. 


2.7 UNIT-ROOT NONSTATIONARITY 


So far we have focused on return series that are stationary. In some studies, interest 
rates, foreign exchange rates, or the price series of an asset are of interest. These 
series tend to be nonstationary. For a price series, the nonstationarity is mainly 


72 LINEAR TIME SERIES ANALYSIS AND ITS APPLICATIONS 


due to the fact that there is no fixed level for the price. In the time series lit- 
erature, such a nonstationary series is called unit-root nonstationary time series. 
The best known example of unit-root nonstationary time series is the random-walk 
model. 


2.7.1 Random Walk 


A time series {p;} is a random walk if it satisfies 
Pr = Pt- + Gr, (2.35) 


where po is a real number denoting the starting value of the process and {a+} is a 
white noise series. If p, is the log price of a particular stock at date t, then po could 
be the log price of the stock at its initial public offering (IPO) (i.e., the logged IPO 
price). If ap has a symmetric distribution around zero, then conditional on p;—, 
p: has a 50—50 chance to go up or down, implying that p, would go up or down 
at random. If we treat the random-walk model as a special AR(1) model, then the 
coefficient of p;—; is unity, which does not satisfy the weak stationarity condition 
of an AR(1) model. A random-walk series is, therefore, not weakly stationary, and 
we call it a unit-root nonstationary time series. 

The random-walk model has widely been considered as a statistical model for 
the movement of logged stock prices. Under such a model, the stock price is not 
predictable or mean reverting. To see this, the 1-step-ahead forecast of model (2.35) 
at the forecast origin h is 


Pn) = E(Prn41l Ph, Pa—-i,---) = Ph; 


which is the log price of the stock at the forecast origin. Such a forecast has no 
practical value. The 2-step-ahead forecast is 


Pn(2) = E(Pr42| Ph, Pa—-i,---) = E(payi + an42/ Pn, Ph-1, ---) 


= E(pnsilPns Ph-1, ---) = Ph) = Pn, 


which again is the log price at the forecast origin. In fact, for any forecast horizon 
£>0, we have 


Pr (£) = Pr- 


Thus, for all forecast horizons, point forecasts of a random-walk model are simply 
the value of the series at the forecast origin. Therefore, the process is not mean 
reverting. 

The MA representation of the random-walk model in Eq. (2.35) is 


Pi = Qi + a1 Farat 
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This representation has several important practical implications. First, the ¢-step- 
ahead forecast error is 


en(€) = anye + +++ + Anat, 


so that Var[e,(€)] = tor, which diverges to infinity as £ — oo. The length of an 
interval forecast of pn+e will approach infinity as the forecast horizon-increases. 
This result says that the usefulness of point forecast p),(€) diminishes as £ increases, 
which again implies that the model is not predictable. Second, the unconditional 
variance of p, is unbounded because Var[e,(£)] approaches infinity as £ increases. 
Theoretically, this means that p; can assume any real value for a sufficiently large t. 
For the log price p; of an individual stock, this is plausible. Yet for market indexes, 
negative log price is very rare if it happens at all. In this sense, the adequacy of a 
random-walk model for market indexes is questionable. Third, from the represen- 
tation, w; = 1 for all i. Thus, the impact of any past shock a;_; on p; does not 
decay over time. Consequently, the series has a strong memory as it remembers all 
of the past shocks. In economics, the shocks are said to have a permanent effect 
on the series. The strong memory of a unit-root time series can be seen from the 
sample ACF of the observed series. The sample ACFs are all approaching 1 as the 
sample size increases. 


2.7.2 Random Walk with Drift 


As shown by empirical examples considered so far, the log return series of a market 
index tends to have a small and positive mean. This implies that the model for the 
log price is 


Pi = U F pi-i +a, (2.36) 


where u = E(p; — p;—1) and {a;} is a zero-mean white noise series. The constant 
term u of model (2.36) is very important in financial study. It represents the time 
trend of the log price p; and is often referred to as the drift of the model. To see 
this, assume that the initial log price is po. Then we have 


Pı = H + pota, 
p2 = u+ pı +a = 2u + po+a +a, 


Pt = tU + po +a + arni te + a1. 


The last equation shows that the log price consists of a time trend tu and a 
pure random-walk process }`;_; a;. Because Var(%-;_; a;) = to2, where a? is the 


variance of a;, the conditional standard deviation of p; is /toq, which grows at a 
slower rate than the conditional expectation of p;. Therefore, if we graph p; against 
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the time index f, we have a time trend with slope u. A positive slope u implies 
that the log price eventually goes to infinity. In contrast, a negative u implies that 
the log price would converge to —oo as ¢ increases. Based on the above discussion, 
it is then not surprising to see that the log return series of the CRSP value- and 
equal-weighted indexes have a small, but statistically significant, positive mean. 

To illustrate the effect of the drift parameter on the price series, we consider the 
monthly log stock returns of the 3M Company from February 1946 to December 
2008. As shown by the sample EACF in Table 2.5, the series has no significant 
serial correlation. The series thus follows the simple model 


rı = 0.0103 + ar, a = 0.0637, (2.37) 


where 0.0103 is the sample mean of 7; and has a standard error 0.0023. The mean 
of the monthly log returns of 3M stock is, therefore, significantly different from 
zero at the 1% level. As a matter of fact, the one-sample test of zero mean shows 
a t ratio of 4.44 with a p value close to 0. We use the log return series to construct 
two log price series, namely 


$ 


t 
p=} ri and p=} a, 
i=l 


i=l 


where a; is the mean-corrected log return in Eq. (2.37) (i.e., a; = r; — 0.0103). 
The p, is the log price of 3M stock, assuming that the initial log price is zero 
(i.e., the log price of January 1946 was zero). The py is the corresponding log 
price if the mean of log returns was zero. Figure 2.10 shows the time plots of 
p: and př as well as a straight line y; = 0.0103 x t + 1946, where f is the time 
sequence of the returns and 1946 is the starting year of the stock. From the plots, 
the importance of the constant 0.0103 in Eq. (2.37) is evident. In addition, as 
expected, it represents the slope of the upward trend of p;. 


Interpretation of the Constant Term 
From the previous discussions, it is important to understand the meaning of a 
constant term in a time series model. First, for an MA(q) model in Eq. (2.22), the 
constant term is simply the mean of the series. Second, for a stationary AR(p) model 
in Eq. (2.9) or ARMA(p, g) model in Eq. (2.28), the constant term is related to 
the mean via u = $o/(1 — 1 —--- — $p). Third, for a random walk with drift, the 
constant term becomes the time slope of the series. These different interpretations 
for the constant term in a time series model clearly highlight the difference between 
dynamic and usual linear regression models. 

Another important difference between dynamic and regression models is shown 
by an AR(1) model and a simple linear regression model, 


ri = Qo + Oin-1 + ar and yı = Po + Bixt + ar. 


For the AR(1) model to be meaningful, the coefficient ¢; must satisfy |ġı| < 1. 
However, the coefficient 6; can assume any fixed real number. 
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Figure 2.10 Time plots of log prices for 3M stock from February 1946 to December 2008, assuming 
that log price of January 1946 was zero. The “o” line is for log price without time trend. Straight line 
is y; = 0.0103 x t + 1946. 


2.7.3 Trend-Stationary Time Series 


A closely related model that exhibits linear trend is the trend-stationary time series 
model, 


Pie = Pot Bit + rr, 


where r; is a stationary time series, for example, a stationary AR(p) series. Here p; 
grows linearly in time with rate 6; and hence can exhibit behavior similar to that of 
a random-walk model with drift. However, there is a major difference between the 
two models. To see this, suppose that po is fixed. The random-walk model with drift 
assumes the mean E(p;) = po + ut and variance Var(p;) = toż, both of them are 
time dependent. On the other hand, the trend-stationary model assumes the mean 
E(p:) = o + Bit, which depends on time, and variance Var(p;) = Var(r;), which 
is finite and time invariant. The trend-stationary series can be transformed into a 
stationary one by removing the time trend via a simple linear regression analysis. 


For analysis of trend-stationary time series, see the method of Section 2.9. 


2.7.4 General Unit-Root Nonstationary Models 


Consider an ARMA model. If one extends the model by allowing the AR poly- 
nomial to have 1 as a characteristic root, then the model becomes the well-known 
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autoregressive integrated moving-average (ARIMA) model. An ARIMA model is 
said to be unit-root nonstationary because its AR polynomial has a unit root. Like 
a random-walk model, an ARIMA model has strong memory because the y; coef- 
ficients in its MA representation do not decay over time to zero, implying that the 
past shock a;—; of the model has a permanent effect on the series. A conventional 
approach for handling unit-root nonstationarity is to use differencing. 


Differencing 

A time series y; is said to be an ARIMA(p, 1, q) process if the change series 
Cr = yt — Yr-1 = (1 — B)y, follows a stationary and invertible ARMA(p, q) model. 
In finance, price series are commonly believed to be nonstationary, but the log 
return series, r; = In(P;) — In(P;_1), is stationary. In this case, the log price series 
is unit-root nonstationary and hence can be treated as an ARIMA process. The 
idea of transforming a nonstationary series into a stationary one by considering 
its change series is called differencing in the time series literature. More formally, 
Ct = yt — Y;-1 is referred to as the first differenced series of y,. In some scientific 
fields, a time series y, may contain multiple unit roots and needs to be differenced 
multiple times to become stationary. For example, if both y; and its first differenced 
series c; = y; — y;—1 are unit-root nonstationary, but s; = Cy — Ct-1 = yy — 2¥;-1 + 
y;-2 is weakly stationary, then y; has double unit roots, and s; is the second 
differenced series of y;. In addition, if s; follows an ARMA(p, q) model, then y; 
is an ARIMA(p, 2, q) process. For such a time series, if s, has a nonzero mean, 
then y; has a quadratic time function and the quadratic time coefficient is related 
to the mean of s;. The seasonally adjusted series of U.S. quarterly gross domestic 
product implicit price deflator might have double unit roots. However, the mean 
of the second differenced series is not significantly different from zero; see the 
Exercises of this chapter. Box, Jenkins, and Reinsel (1994) discuss many properties 
of general ARIMA models. 


2.7.5 Unit-Root Test 


To test whether the log price p; of an asset follows a random walk or a random 
walk with drift, we employ the models 


Pr = Q1 pr-1 + €r (2.38) 
Pi = o + Q1 Pr-1 + er, (2.39) 


where e, denotes the error term, and consider the null hypothesis Ho : ¢; = 1 
versus the alternative hypothesis H, : ¢; < 1. This is the well-known unit-root 
testing problem; see Dickey and Fuller (1979). A convenient test statistic is the 
t ratio of the least-squares (LS) estimate of ¢; under the null hypothesis. For 
Eq. (2.38), the LS method gives 


T T * 
ĝi = et Pt-1Pt 52 = ia Pr = #1 P11)" 
= = 2 
pape Pet ° T-1l 


’ 
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where po = 0 and T is the sample size. The f ratio is 


ĝi=1 Xii Prier 
A n T $. 
seen) Tey pare Pi 


which is commonly referred to as the Dickey—Fuller (DF) test. If {e;} is a white 
noise series with finite moments of order slightly greater than 2, then the DF statistic 
converges to a function of the standard Brownian motion as T —> oo; see Chan and 
Wei (1988) and Phillips (1987) for more information. If ġo is zero but Eq. (2.39) 
is employed anyway, then the resulting ¢ ratio for testing ¢; = 1 will converge to 
another nonstandard asymptotic distribution. In either case, simulation is used to 
obtain critical values of the test statistics; see Fuller (1976, Chapter 8) for selected 
critical values. Yet if 9 # 0 and Eq. (2.39) is used, then the ¢ ratio for testing 
1 = | is asymptotically normal. However, large sample sizes are needed for the 
asymptotic normal distribution to hold. Standard Brownian motion is introduced in 
Chapter 6. 

For many economic time series, ARIMA(p, d, q) models might be more appro- 
priate than the simple model in Eq. (2.39). In the econometric literature, AR(p) 
models are often used. Denote the series by x;. To verify the existence of a unit 
root in an AR(p) process, one may perform the test Ho : 6 = 1 vs. Ha: < 1 
using the regression 


DF = tratio = 


p-l 
Xp = Cy + BXr-1 + YG Axi + êr, (2.40) 


i=1 


where c; is a deterministic function of the time index t and Ax; = x; — xj—1 is the 
differenced series of x;. In practice, c; can be zero or a constant or c; = wo + ay. 
The ¢ ratio of £ — 1, 


—1 
std(ĝ) ` 


> 


ADF-test = 


where B denotes the least-squares estimate of £, is the well-known augmented 
Dickey—Fuller (ADF) unit-root test. Note that because of the first differencing, 
Eq. (2.40) is equivalent to an AR(p) model with deterministic function c;. Equation 
(2.40) can also be rewritten as 


p-1 
Ax} = Ct + BeX1-1 + X Axi =H er, 


i=l 


where e = B — 1. One can then test the equivalent hypothesis Ho : Be = O vs. 
Ay: Be < 0. 
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Figure 2.11 Log series of U.S. quarterly GDP from 1947.1 to 2008.IV: (a) time plot of logged GDP 
series, (b) sample ACF of log GDP data, (c) time plot of first differenced series, and (d) sample PACF 
of differenced series. 


Example 2.2. Consider the log series of U.S. quarterly GDP from 1947.1 to 
2008.IV. The series exhibits an upward trend, showing the growth of the U.S. 
economy, and has high sample serial correlations; see the lower left panel of 
Figure 2.11. The first differenced series, representing the growth rate of U.S. 
GDP and also shown in Figure 2.11, seems to vary around a fixed mean level, 
even though the variability appears to be smaller in recent years. To confirm the 
observed phenomenon, we apply the ADF unit-root test to the log series. Based 
on the sample PACF of the differenced series shown in Figure 2.11, we choose 
p = 10. Other values of p are also used, but they do not alter the conclusion of 
the test. With p = 10, the ADF test statistic is —1.701 with a p value 0.4297, indi- 
cating that the unit-root hypothesis cannot be rejected. From the attached S-Plus 
output, Ê = 1+ ĝe = 1 — 0.0008 = 0.9992. 


R Demonstration 


> library (fUnitRoots) 

> da=read.table("q-gdp4708.txt",header=T) 
> gdp=log(da[,4]) 

> ml=ar (diff (gdp) ,method='mle’ ) 

> mlSorder 

[1] 10 

> adfTest (gdp, lags=10,type=c("c") ) 


UNIT-ROOT NONSTATIONARITY 79 


Title: 
Augmented Dickey-Fuller Test 


Test Results: 

PARAMETER: 
Lag Order: 10 
STATISTIC: 
Dickey-Fuller: -1.6109 
P VALUE: 0.4569 


S-Plus Demonstration 
The following output has been edited: 


> adft=unitroot (gdp, trend='c’,method='adf’ , lags=10) 
> summary (adft) 


Test for Unit Root: Augmented DF Test 

Null Hypothesis: there is a unit root 
Type of Test: t-test 
Test Statistic: -1.701 
P-value: 0.4297 


Coefficients: 
Value Std. Error t value Pr(>|t]) 
lag1 -0.0008 0.0005 -1.7006 0.0904 
lag2 0.3799 0.0659 5.7637 0.0000 
lag3 0.1883 0.0696 2.7047 0.0074 
lag10 0.1784 0.0637 2.8023 0.0055 
constant 0.0134 0.0045 2.9636 0.0034 


Regression Diagnostics: 
R-Squared 0.2877 

Adjusted R-Squared 0.2564 

Durbin-Watson Stat 1.9940 


Residual standard error: 0.009318 on 234 degrees of freedom 


As another example, consider the log series of the S&P 500 index from Jan- 
uary 3, 1950, to April 16, 2008, for 14,462 observations. The series is shown in 
Figure 2.12. Testing for a unit root in the index is relevant if one wishes to verify 
empirically that the Index follows a random walk with drift. To this end, we use 
Ct = wo + wit in applying the ADF test. Furthermore, we choose p = 15 based 
on the sample PACF of the first differenced series. The resulting test statistic is 
—1.998 with a p value 0.602. Thus, the unit-root hypothesis cannot be rejected 
at any reasonable significance level. The constant term is statistically significant, 
whereas the estimate of the time trend is not at the usual 5% level. The latter is 
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Figure 2.12 Time plot of logarithm of daily S&P 500 index from January 3, 1950, to April 16, 2008. 


significant at the 10% level, however. In summary, for the period from January 
1950 to April 2008, the log series of the S&P 500 index contains a unit root and 
a positive drift, but there is no strong evidence of a time trend. 


R Demonstration 


> library (fUnitRoots) 
> da=read.table("d-sp55008.txt",header=T) 
> sp5=log(dal[,7]) 
> m2=ar(diff(sp5) ,method='mle’ ) 
> m2Sorder 
1) 2 


> adfTest(sp5,lags=2,type=("ct") ) 


Title: 
Augmented Dickey-Fuller Test 


Test Results: 

PARAMETER: 

Lag Order: 2 
STATISTIC: 
Dickey-Fuller: -2.0179 
P VALUE: 0.5708 


> adfTest(sp5,lags=15,type=("ct") ) 
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Title: 


Test Results: 
PARAMETER: 

Lag Order: 15 

STATISTIC: 

Dickey-Fuller: 

P VALUE: 0.5807 


S-Plus Demonstration 


=i 


9946 


Augmented Dickey-Fuller Test 


The following output has been edited: 


> adft=unitroot (sp5,method='adf’,trend='ct’,lags=15) 


> summary (adft) 


Test for Unit Root: Augmented DF Test 


Null Hypothesis: 
Type of Test: 
Test Statistic: 
P-value: 


Coefficients: 
Value 
lagl -0.0005 
lag2 0.0722 
lag3 -0.0386 
lag4 -0.0071 


lag15 0.0133 
constant 0.0019 
time 0.0020 


there is a unit root 


t-test 


=1 998 


O. 


602 


Std. Error t value Pr(>|t]) 


0. 
0.0083 
0. 

0.0083 


cen ae 22) 


0003 


0083 


0083 
-0008 
-0011 


Regression Diagnostics: 


R-Squared 0.0081 
Adjusted R-Squared 0.0070 
Durbin-Watson Stat 1.9995 


Residual standard error: 
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0.008981 on 14643 degrees of freedom 


Some financial time series such as quarterly earnings per share of a company 
exhibits certain cyclical or periodic behavior. Such a time series is called a sea- 
sonal time series. Figure 2.13(a) shows the time plot of quarterly earnings per share 
of Johnson & Johnson from the first quarter of 1960 to the last quarter of 1980. 
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Figure 2.13 Time plots of quarterly earnings per share of Johnson & Johnson from 1960 to 1980: 
(a) observed earnings and (b) log earnings. 


The data obtained from Shumway and Stoffer (2000) possess some special char- 
acteristics. In particular, the earnings grew exponentially during the sample period 
and had a strong seasonality. Furthermore, the variability of earnings increased 
over time. The cyclical pattern repeats itself every year so that the periodicity of 
the series is 4. If monthly data are considered (e.g., monthly sales of Wal-Mart 
stores), then the periodicity is 12. Seasonal time series models are also useful in 
pricing weather-related derivatives and energy futures because most environmental 
time series exhibit strong seasonal behavior. 

Analysis of seasonal time series has a long history. In some applications, sea- 
sonality is of secondary importance and is removed from the data, resulting in a 
seasonally adjusted time series that is then used to make inference. The procedure 
to remove seasonality from a time series is referred to as seasonal adjustment. Most 
economic data published by the U.S. government are seasonally adjusted (e.g., the 
growth rate of gross domestic product and the unemployment rate). In other appli- 
cations such as forecasting, seasonality is as important as other characteristics of 
the data and must be handled accordingly. Because forecasting is a major objective 
of financial time series analysis, we focus on the latter approach and discuss some 
econometric models that are useful in modeling seasonal time series. 


2.8.1 Seasonal Differencing 


Figure 2.13(b) shows the time plot of log earnings per share of Johnson & Johnson. 
We took the log transformation for two reasons. First, it is used to handle the 


SEASONAL MODELS 83 


exponential growth of the series. Indeed, the plot confirms that the growth is linear 
in the log scale. Second, the transformation is used to stablize the variability of 
the series. Again, the increasing pattern in variability of Figure 2.13(a) disappears 
in the new plot. Log transformation is commonly used in analysis of financial 
and economic time series. In this particular instance, all earnings are positive so 
that no adjustment is needed before taking the transformation. In some cases, 
one may need to add a positive constant to every data point before taking the 
transformation. 

Denote the log earnings by x,. The upper left panel of Figure 2.14 shows the sam- 
ple ACF of x;, which indicates that the quarterly log earnings per share has strong 
serial correlations. A conventional method to handle such strong serial correlations 
is to consider the first differenced series of x; [i.e., Ax; = x; — x-1 = (1 — B)x;]. 
The lower left plot of Figure 2.14 gives the sample ACF of Ax,. The ACF is strong 
when the lag is a multiple of periodicity 4. This is a well-documented behav- 
ior of sample ACF of a seasonal time series. Following the procedure of Box, 
Jenkins, and Reinsel (1994, Chapter 9), we take another difference of the data, 
that is, 


Ag(Ax;) = (1 — B’) Ax; = Ax, — AXi—4 = Xi — Xt—1 — Xi—4 F Xy-5. 
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Figure 2.14 Sample ACF of log series of quarterly earnings per share of Johnson & Johnson from 


1960 to 1980. (a) log earnings, (b) first differenced series, (c) seasonally differenced series, and (d) 
series with regular and seasonal differencing. 
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The operation A4 = (1 — B*) is called a seasonal differencing. In general, for a 
seasonal time series y, with periodicity s, seasonal differencing means 


AsYyt = yi — Yr-y = (1 — B*) yy. 


The conventional difference Ay; = yr — y-; = (1 — B)y, is referred to as the 
regular differencing. The lower right plot of Figure 2.14 shows the sample ACF 
of A4 Ax;, which has a significant negative ACF at lag 1 and a marginal negative 
correlation at lag 4. For completeness, Figure 2.14 also gives the sample ACF of 
the seasonally differenced series A4x;. 


2.8.2 Multiplicative Seasonal Models 


The behavior of the sample ACF of (1 — B*)(1 — B)x, in Figure 2.14 is common 
among seasonal time series. It led to the development of the following special 
seasonal time series model: 


(1 — B°)d — B)x, = (1 — 6B) — OB* )a,, (2.41) 


where s is the periodicity of the series, a; is a white noise series, |9| < 1, and 
|©| < 1. This model is referred to as the airline model in the literature; see Box, 
Jenkins, and Reinsel (1994, Chapter 9). It has been found to be widely applicable 
in modeling seasonal time series. The AR part of the model simply consists of the 
regular and seasonal differences, whereas the MA part involves two parameters. 
Focusing on the MA part (i.e., on the model), 


w; = (1 — 0 B)(1 — OB" )a; = a; — barı — Oars + 0Oa-s-1, 
where w, = (1 — B*)(1 — B)x; and s > 1. It is easy to obtain that E(w,) = 0 and 


Var(w;) = (1 + 67)(1 + @°?)02, 
Cov(w;, w1) = —0 (1 + ©7)o?, 
Cov(w;, Wr—s+1) = 00ož, 
Cov(w;, ws) = -O (1 + 07)o2, 
Cov(w;, Wt—s-1) = 0Oož, 


Cov(w;, wr—e)=0, for £#0,1,s—1,s,s+1. 


Consequently, the ACF of the w, series is given by 


_, =e _ =0 _ _ _ 00 
ia 1+ 62’ Ps = 1+ @2’ Ps-1 = Ps+l = PIPs = (1+ 62)(1 + @2)’ 


pı 


SEASONAL MODELS 85 


and pe = 0 for £ >0 and £ Æ 1,5 —1,5,s5+ 1. For example, if w; is a quarterly 
time series, then s = 4 and for €>0, the ACF pẹ is nonzero at lags 1, 3, 4, and 
5 only. 

It is interesting to compare the prior ACF with those of the MA(1) model 
yı = (1 — OB)a; and the MA(s) model z; = (1 — OB*)a;. The ACF of y; and z; 
series are 


—0 
pity) = Iro and pe(y) =0, >, 
(z) = and (z) =0 L>0, £ 
Ss = = U, ý S. 
Ps 4o Pe 


We see that (i) p1 = pi(y), Gi) Ps = ps (z), and Gii) Ps—1 = Ps+1 = pi(y) X ps (Z). 
Therefore, the ACF of w, at lags (s — 1) and (s+ 1) can be regarded as the 
interaction between lag-1 and lag-s serial dependence, and the model of w; is 
called a multiplicative seasonal MA model. In practice, a multiplicative seasonal 
model says that the dynamics of the regular and seasonal components of the series 
are approximately orthogonal. 

The model 


w: = (1 — 0B — OB*)a, (2.42) 


where |0| < | and |O| < 1, is a nonmultiplicative seasonal MA model. It is easy 
to see that for the model in Eq. (2.42), ps4; = 0. A multiplicative model is more 
parsimonious than the corresponding nonmultiplicative model because both models 
use the same number of parameters, but the multiplicative model has more nonzero 
ACFs. 


Example 2.3. In this example we apply the airline model to the log series of 
quarterly earnings per share of Johnson & Johnson from 1960 to 1980. Based on 
the exact-likelihood method, the fitted model is 


(1 — B)(1 — B*)x, = (1 — 0.678B)(1 — 0.314B*)a;,, dq = 0.089, 


where standard errors of the two MA parameters are 0.080 and 0.101, respectively. 
The Ljung—Box statistics of the residuals show Q(12) = 10.0 with a p value of 
0.44. The model appears to be adequate. 

To illustrate the forecasting performance of the prior seasonal model, we rees- 
timate the model using the first 76 observations and reserve the last 8 data points 
for forecasting evaluation. We compute 1-step- to 8-step-ahead forecasts and their 
standard errors of the fitted model at the forecast origin h = 76. An antilog trans- 
formation is taken to obtain forecasts of earnings per share using the relationship 
between normal and lognormal distributions given in Chapter 1. Figure 2.15 shows 
the forecast performance of the model, where the observed data are in solid line, 
point forecasts are shown by dots, and the dashed lines show 95% interval fore- 
casts. The forecasts show a strong seasonal pattern and are close to the observed 
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Year 
Figure 2.15 Out-of-sample point and interval forecasts for quarterly earnings of Johnson & Johnson. 


Forecast origin is fourth quarter of 1978. In plot, solid line shows actual observations, dots represent 
point forecasts, and dashed lines show 95% interval forecasts. 


data. Finally, for an alternative approach to modeling the quarterly earnings data, 
see Example 11.3. 

When the seasonal pattern of a time series is stable over time (e.g., close to a 
deterministic function), dummy variables may be used to handle the seasonality. 
This approach is taken by some analysts. However, deterministic seasonality is a 
special case of the multiplicative seasonal model discussed before. Specifically, 
if © = 1, then model (2.41) contains a deterministic seasonal component. Con- 
sequently, the same forecasts are obtained by using either dummy variables or a 
multiplicative seasonal model when the seasonal pattern is deterministic. Yet use of 
dummy variables can lead to inferior forecasts if the seasonal pattern is not deter- 
ministic. In practice, we recommend that the exact-likelihood method should be 
used to estimate a multiplicative seasonal model, especially when the sample size is 
small or when there is the possibility of having a deterministic seasonal component. 


Example 2.4. To demonstrate deterministic seasonal behavior, consider the 
monthly simple returns of the CRSP Decile 1 Index from January 1970 to December 
2008 for 468 observations. The series is shown in Figure 2.16(a), and the time plot 
does not show any clear pattern of seasonality. However, the sample ACF of the 
return series shown in Figure 2.16(b) contains significant lags at 12, 24, and 36 as 
well as lag 1. If seasonal ARMA models are entertained, a model in the form 


(1 — $B) — @2B")R; = (1 — 012B")a; 
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Figure 2.16 Monthly simple returns of CRSP Decile 1 index from January 1970 to December 2008: 
(a) time plot of the simple returns, (b) sample ACF of simple returns, (c) time plot of simple returns 
after adjusting for January effect, and (d) sample ACF of adjusted simple returns. 


is identified, where R, denotes the monthly simple return. Using the conditional- 
likelihood method, the fitted model is 


(1 — 0.18B)(1 — 0.87B'7)R, = (1 — 0.74B"”)a,, Sq = 0.069. 


See the attached SCA (Scientific Computing Associates) output below. The esti- 
mates of the seasonal AR and MA coefficients are of similar magnitude. If the 
exact-likelihood method is used, we have 


(1 — 0.188B)(1 — 0.951B'”)R, = (1 — 0.997B"”)a,, õa = 0.063. 


The cancellation between seasonal AR and MA factors is clearly seen. This high- 
lights the usefulness of using the exact-likelihood method and, the estimation result 
suggests that the seasonal behavior might be deterministic. To further confirm this 
assertion, we define the dummy variable for January, that is, 


1 if ¢ is January, 
Jan; = i 
0 otherwise, 


and employ the simple linear regression 


R; = Bo + BiJan; + er. 
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The fitted model is R, = 0.0029 + 0.1253Jan; + e+, where the standard errors of 
the estimates are 0.0033 and 0.0115, respectively. The right panel of Figure 2.16 
shows the time plot and sample ACF of the residual series of the prior simple linear 
regression. From the sample ACF, serial correlations at lags 12, 24, and 36 largely 
disappear, suggesting that the seasonal pattern of the Decile | returns has been 
successfully removed by the January dummy variable. Consequently, the seasonal 
behavior in the monthly simple return of Decile | is mainly due to the January 


effect. 


R Demonstration 
The following output has been edited and % denotes explanation: 


> da=read.table("m-deciles08.txt",header=T) 

> dl=dal[,2] 

> jan=rep(c(1,rep(0,11)),39) % Create January dummy. 
> ml=lm(d1 jan) 

> summary (m1) 

Call: 

lm(formula = d1 ~ jan) 


Coefficients: 

Estimate Std. Error t value Pr(>|t]) 
(Intercept) 0.002864 0.003333 0.859 0.391 
jan 0.125251 0.011546 10.848 <2e-16 *** 


Residual standard error: 0.06904 on 466 degrees of freedom 
Multiple R-squared: 0.2016, Adjusted R-squared: 0.1999 


> m2=arima(dl1,order=c(1,0,0),seasonal=list (order=c(1,0,1), 
+ period=12) ) 


> m2 
Coefficients: 
arl sarl smal intercept 
0.1769 0.9882 -0.9144 0.0118 
s.e. 0.0456 0.0093 0.0335 0.0129 


sigma*2 estimated as 0.004717: log likelihood=584.07, 
aic=-1158.14 
tsdiag(m2,gof=36) % plot not shown. 


Vv 


> m2=arima(d1,order=c(1,0,0),seasonal=list (order=c(1,0,1), 

+ period=12),include.mean=F) 

> m2 

Call: 

arima (x=d1,order=c(1,0,0),seasonal=list(order=c(1,0,1), 
period=12),include.mean = F) 
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Coefficients: 
arl sarl smal 
0.1787 0.9886 -0.9127 % Slightly differ from those of SCA. 
s.e. 0.0456 0.0089 0.0335 


sigma*2 estimated as 0.00472: log likelihood=583.68, 
aic=-1159.36 


SCA Demonstration 
The following output has been edited: 


input date,dec1,d2,d9,d10. file ‘m-deciles08.txt’. 


tsm m1. model (1) (12)decl1=(12)noise. 


estim ml. hold resi(r1). % Conditional MLE estimation 
SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- M1 
VAR TYPE OF ORIGINAL DIFFERENCING 
VARIABLE OR CENTERED 


DEC1 RANDOM ORIGINAL NONE 

PAR. VAR. NUM./ FACTOR ORDER CONS- VALUE STD T 

LABEL NAME DENOM. TRAINT ERROR VALUE 

1 D1 MA 1 12 NONE .7388 .0488 15.14 
D1 AR 1 I NONE .1765 .0447 3795 

3 D1 AR 2 12 NONE .8698 .0295 29.49 

EFFECTIVE NUMBER OF OBSERVATIONS . . 455 

R=SQUARE. 6° § 4 opo Boog w How OS a ar a G 0.199 

RESIDUAL STANDARD ERROR. . . . . . . 0.689906E-01 

RESIDUAL STANDARD ERROR. . ... . . 0.705662E-01 


oe 


estim ml. method exact. hold resi(r1) Exact MLE estimation 


SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- M1 
VAR. TYPE OF ORIGINAL DIFFERENCING 
VAR. OR CENTERED 
DEC1 RANDOM ORIGINAL NONE 
PAR. VARI. NUM./ FACTOR ORDER CONS- VALUE STD P 
LABEL NAME DENOM. TRAINT ERROR VALUE 


1 D1 MA 1 12 NONE -9968 .0150 66.31 
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2 D1 AR 1 al NONE -1884 .0448 4.21 
3 D1 AR 2 12 NONE 29505 +0070 135.46 
EFFECTIVE NUMBER OF OBSERVATIONS . . 455 
R=SQUARE. s 0 de Ss a ER a aE w o e 0.328 
RESIDUAL STANDARD ERROR. . . . . . . 0.631807E-01 


2.9 REGRESSION MODELS WITH TIME SERIES ERRORS 


In many applications, the relationship between two time series is of major interest. 
An obvious example is the market model in finance that relates the excess return 
of an individual stock to that of a market index. The term structure of interest rates 
is another example in which the time evolution of the relationship between interest 
rates with different maturities is investigated. These examples lead naturally to the 
consideration of a linear regression in the form 


Ye =Q + Bx; + er, (2.43) 


where y, and x; are two time series and e, denotes the error term. The least- 
squares (LS) method is often used to estimate model (2.43). If {e;} is a white noise 
series, then the LS method produces consistent estimates. In practice, however, it 
is common to see that the error term e; is serially correlated. In this case, we have 
a regression model with time series errors, and the LS estimates of œ and 6 may 
not be consistent. 

A regression model with time series errors is widely applicable in economics 
and finance, but it is one of the most commonly misused econometric models 
because the serial dependence in e; is often overlooked. It pays to study the model 
carefully. 

We introduce the model by considering the relationship between two U.S. weekly 
interest rate series: 


1. riz: the 1-year Treasury constant maturity rate 
2. r3: the 3-year Treasury constant maturity rate 


Both series have 2467 observations from January 5, 1962, to April 10, 2009, 
and are measured in percentages. The series are obtained from the Federal 
Reserve Bank of St Louis. Strictly speaking, we should model the two interest 
series jointly using multivariate time series analysis in Chapter 8. However, for 
simplicity, we focus here on regression-type analysis and ignore the issue of 
simultaneity. 

Figure 2.17 shows the time plots of the two interest rates with a solid line 
denoting the l-year rate and a dashed line the 3-year rate. Figure 2.18(a) plots ry, 
versus r3,, indicating that, as expected, the two interest rates are highly correlated. 
A naive way to describe the relationship between the two interest rates is to use 
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Figure 2.17 Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962, to April 


10, 2009. Solid line is Treasury 1-year constant maturity rate and dashed line Treasury 3-year constant 
maturity rate. 


the simple model r3, = a + Bri; + er. This results in a fitted model 
r3t = 0.832 + 0.93011; + €r, Oe = 0.523 (2.44) 


with R? = 96.5% , where the standard errors of the two coefficients are 0.024 and 
0.004, respectively. Model (2.44) confirms the high correlation between the two 
interest rates. However, the model is seriously inadequate, as shown by Figure 2.19, 
which gives the time plot and ACF of its residuals. In particular, the sample ACF 
of the residuals is highly significant and decays slowly, showing the pattern of 
a unit-root nonstationary time series. The behavior of the residuals suggests that 
marked differences exist between the two interest rates. Using the modern econo- 
metric terminology, if one assumes that the two interest rate series are unit-root 
nonstationary, then the behavior of the residuals of Eq. (2.44) indicates that the two 
interest rates are not cointegrated; see Chapter 8 for discussion of cointegration. 
In other words, the data fail to support the hypothesis that there exists a long-term 
equilibrium between the two interest rates. In some sense, this is not surprising 
because the pattern of “inverted yield curve” did occur during the data span. By 
inverted yield curve we mean the situation under which interest rates are inversely 
related to their time to maturities. 

The unit-root behavior of both interest rates and the residuals of Eq. (2.44) leads 
to the consideration of the change series of interest rates. Let 
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Figure 2.18 Scatterplots of U.S. weekly interest rates from January 5, 1962, to April 10, 2009: 
(a) 3-year rate vs. 1-year rate and (b) changes in 3-year rate vs. changes in 1-year rate. 


l. Ci = ri — 12-1 = (1 — B)ri; for t > 2: changes in the 1-year interest rate 
2. C3 = r3t — 13,1-1 = (1 — B)rzy for t > 2: changes in the 3-year interest rate 


and consider the linear regression c3, = 6c; + e;. Figure 2.20 shows time plots of 
the two change series, whereas Figure 2.18(b) provides a scatterplot between them. 
The change series remain highly correlated with a fitted linear regression model 
given by 
c3¢ = 0.792c1; + er, õe = 0.0690, (2.45) 
with R? = 82.5%. The standard error of the coefficient is 0.0073. This model 
further confirms the strong linear dependence between interest rates. Figure 2.21 
shows the time plot and sample ACF of the residuals of Eq. (2.45). Once again, 
the ACF shows some significant serial correlations in the residuals, but magnitudes 
of the correlations are much smaller. This weak serial dependence in the residuals 
can be modeled by using the simple time series models discussed in the previous 
sections, and we have a linear regression with time series errors. 
The main objective of this section is to discuss a simple approach for building a 
linear regression model with time series errors. The approach is straightforward. We 
employ a simple time series model discussed in this chapter for the residual series 
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Figure 2.19 Residual series of linear regression (2.44) for two U.S. weekly interest rates: (a) time 
plot and (b) sample ACF. 


and estimate the whole model jointly. For illustration, consider the simple linear 
regression in Eq. (2.45). Because residuals of the model are serially correlated, we 
shall identify a simple ARMA model for the residuals. From the sample ACF of 
the residuals shown in Figure 2.21, we specify an MA(1) model for the residuals 
and modify the linear regression model to 


c3, = Bey + €r, et = a; — O14;-1, (2.46) 


where {ar} is assumed to be a white noise series. In other words, we simply use 
an MA(1) model, without the constant term, to capture the serial dependence in 
the error term of Eq. (2.45). The resulting model is a simple example of linear 
regression with time series errors. In practice, more elaborated time series models 
can be added to a linear regression equation to form a general regression model 
with time series errors. 

Estimating a regression model with time series errors was not easy before the 
advent of modern computers. Special methods such as the Cochrane—Orcutt estima- 
tor have been proposed to handle the serial dependence in the residuals; see Greene 
(2003, p. 273). By now, the estimation is as easy as that of other time series mod- 
els. If the time series model used is stationary and invertible, then one can estimate 
the model jointly via the maximum-likelihood method. This is the approach we 
take by using either the SCA or R package. R and S-Plus demonstrations are given 
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Figure 2.20 Time plots of change series of U.S. weekly interest rates from January 12, 1962, to April 
10, 2009: (a) changes in Treasury 1-year constant maturity rate and (b) changes in Treasury 3-year 
constant maturity rate. 


later. For the U.S. weekly interest rate data, the fitted version of model (2.46) is 
c3t = 0.794c1, + &;, et = a + 0.1823 a;_1, a = 0.0678, (2.47) 


with R? = 83.1%. The standard errors of the parameters are 0.0075 and 0.0196, 
respectively. The model no longer has a significant lag-1 residual ACF, even though 
some minor residual serial correlations remain at lags 4, 6, and 7. The incremental 
improvement of adding additional MA parameters at lags 4, 6, and 7 to the residual 
equation is small and the result is not reported here. 

Comparing the models in Eqs. (2.44), (2.45), and (2.47), we make the following 
observations. First, the high R? 96.5% and coefficient 0.930 of model (2.44) are 
misleading because the residuals of the model show strong serial correlations. 
Second, for the change series, R? and the coefficient of cı, of models (2.45) and 
(2.47) are close. In this particular instance, adding the MA(1) model to the change 
series only provides a marginal improvement. This is not surprising because the 
estimated MA coefficient is small numerically, even though it is statistically highly 
significant. Third, the analysis demonstrates that it is important to check residual 
serial dependence in linear regression analysis. 
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Figure 2.21 Residual series of linear regression (2.45) for two change series of U.S. weekly interest 
rates: (a) time plot and (b) sample ACF. 


From Eq. (2.47), the model shows that the two weekly interest rate series are 
related as 


13¢ = 13,2-1 + 0.794 (rir — ri 1-1) + ar + 0.182a;-1. 
The interest rates are concurrently and serially correlated. 


R Demonstration 
The following output has been edited. 


ril=read.table("w-gslyr.txt",header=T) [,4] 
r3=read.table("w-gs3yr.txt",header=T) [,4] 
ml=lm(r3 EL) 

summary (m1) 

Call: 

lm(formula = r3 `~ r1) 

Coefficients: 

Estimate Std. Error t value Pr(>|t|) 
(Intercept ) 0.83214 0.02417 34.43 <2e-16 *** 
ri 0492955 0.00357 260.40 <2e-16 *** 


Vvv Vv 
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Residual standard error: 0.5228 on 2465 degrees of freedom 


Multiple R-squared: 0.9649, Adjusted R-squared: 0.9649 
> plot (mlSresiduals,type='1") 

> acf(mlSresiduals, lag=36) 

> el=di ££ (x1) 

> c3=diff(r3) 

> m2=lm(c3 -1+c1) 

> summary (m2) 

Call: 

lm(formula = c3 ~ -1 + c1) 


Coefficients: 
Estimate Std. Error t value Pr(>\t]) 
cl 0.791935 0.007337 107.9 <2e-16 *** 


Residual standard error: 0.06896 on 2465 degrees of freedom 
Multiple R-squared: 0.8253, Adjusted R-squared: 0.8253 


> acf(m2Sresiduals, lag=36) 


> m3=arima(c3,order=c(0,0,1),xreg=cl1,include.mean=F) 
> m3 
Call: 
arima(x = c3, order = c(0, 0, 1), xreg = cl, include.mean = F) 
Coefficients: 
mal cl 

0.1823 0.7936 

s.e. 0.0196 00075 


sigma^2 estimated as 0.0046: log likelihood=3136.62, 
aic=-6267.23 

> 

> rsq=(sum(c3*2)-sum(m3$residuals%*2) )/sum(c3%*2) 

> rsq 

[1] 0.8310077 


Summary 
We outline a general procedure for analyzing linear regression models with time 
series errors: 


1. Fit the linear regression model and check serial correlations of the 
residuals. 

2. If the residual series is unit-root nonstationary, take the first difference of 
both the dependent and explanatory variables. Go to step 1. If the residual 
series appears to be stationary, identify an ARMA model for the residuals 
and modify the linear regression model accordingly. 
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3. Perform a joint estimation via the maximum-likelihood method and check 
the fitted model for further improvement. 


To check the serial correlations of residuals, we recommend that the Ljung—Box 
statistics be used instead of the Durbin—Watson (DW) statistic because the latter 
only considers the lag-1 serial correlation. There are cases in which serial depen- 
dence in residuals appears at higher order lags. This is particularly so when the 
time series involved exhibits some seasonal behavior. 


Remark. For a residual series e, with T observations, the Durbin—Watson 
Statistic is 


T 
DW = Dialer = aay : 
Dera & 


Straightforward calculation shows that DW ~ 2(1 — (1), where ô; is the lag-1 ACF 
of {e;}. 


In S-Plus, regression models with time series errors can be analyzed by the 
command OLS (ordinary least squares) if the residuals assume an AR model. 
Also, to identify a lagged variable, the command is tslag, for example, y = 
tslag(r,1). For the interest rate series, the relevant commands follow, where % 
denotes explanation of the command: 


summary (fit1) 
fit2=OLS (c3t clt+tslag(c3t,1)+tslag(clt,1), na.rm=T) 
summary (fit2) 


> rit=read.table("w-gslyr.txt",header=T) [,4] load data 
> r3t=read.table("w-gs3yr.txt",header=T) [,4] 

> £it=OLS(r3t rit) % fit the first regression 

> summary (fit) 

> c3t=diff (r3t) % take difference 

>. clesditir (rit) 

> £1t1=OLS (c3t™clt) % fit second regression 

> 

> 

> 


See the output in the next section for more information. 


2.10 CONSISTENT COVARIANCE MATRIX ESTIMATION 


Consider again the regression model in Eq. (2.43). There may exist situations in 
which the error term e; has serial correlations and/or conditional heteroscedasticity, 
but the main objective of the analysis is to make inference concerning the regression 
coefficients œ and 6. See Chapter 3 for discussion of conditional heteroscedasticity. 
In situations under which the OLS estimates of the coefficients remain consistent, 
methods are available to provide consistent estimate of the covariance matrix of 
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the coefficient estimates. Two such methods are widely used. The first method is 
called the heteroscedasticity consistent (HC) estimator; see Eicker (1967) and White 
(1980). The second method is called the heteroscedasticity and autocorrelation 
consistent (HAC) estimator; see Newey and West (1987). 

For ease in discussion, we shall rewrite the regression model as 


ye = xip +e, CS leks (2.48) 
where y, is the dependent variable, x, = (x1r, - --, X4;)’ is a k-dimensional vector 
of explanatory variables including constant, and B = (61, ..., By is the parameter 


vector. Here c’ denotes the transpose of the vector c. The LS estimate of B and 
the associate covariance matrix are 


-1 


T “g T 

A a 7 

B= ` xx! y Xe Cov(B) = oś ` Xix, , 
t=1 t=1 


t=1 


where oĉ is the variance of e, and is estimated by the variance of the residuals of the 
regression. In the presence of serial correlations or conditional heteroscedasticity, 
the prior covariance matrix estimator is inconsistent, often resulting in inflating the 
t ratios of B. 

The estimator of White (1980) is 


T SLE T =! 

2 52 

Cov(B) yc = >i xx; > a xx, Xxx; : (2.49) 
=l t=1 t=1 

where ê = ys — x'B is the residual at time t. The estimator of Newey and West 


(1987) is 


T -1 T 
Cov(B)nac = | X xix; | Crac | X xixi) (2.50) 
i=] t=1 
where 


T £ T 
A a2 p ^ i ^ a al 
Cuac = ) e XX, + ) Wj ` (X1êrer— jX; j HXi jlr jerX,), 
t=1 j=l  t=j+1 


where £ is a truncation parameter and w; is a weight function such as the Bartlett 
weight function defined by 
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Other weight functions can also be used. Newey and West (1987) suggest choosing 
£ to be the integer part of 4(7/100)7/°. This estimator essentially uses a nonpara- 
metric method to estimate the covariance matrix of DE 1 rx}. 

For illustration, we employ the first differenced interest rate series in Eq. (2.45). 
The f¢ ratio of the coefficient of cj, is 107.91 if both serial correlation and het- 
eroscedasticity in the residuals are ignored, it becomes 48.44 when the HC estimator 
is used, and it reduces to 39.92 when the HAC estimator is used. The S-Plus demon- 
stration below also uses a regression that includes lagged values c1,⁄—1 and c3,;—1 
as regressors to take care of serial correlations in the residuals. One can also apply 
the HC or HAC estimator to the fitted model to refine the r ratios of the coefficient 
estimates. 


S-Plus Demonstration 
The following output has been edited and % denotes explanation: 


module(finmetrics) 
ri=read.table("w-gslyr.txt",header=T)[,4] % Load data 
r3=read.table("w-gs3yr.txt",header=T) [,4 

cl=diff (r1) % Take 1st difference 

c3=diff(r3) 


VVVVV 


> reg.fit=OLS(c3 cl) % Fit a simple linear regression. 
> summary (reg. fit) 

Call: 

OLS (formula = c3 ~ cl) 


Residuals: 
Min 10 Median 30 Max 
-0.4246 -0.0358 -0.0012 0.0347 0.4892 


Coefficients: 
Value Std. Error t value Pr(>\t|) 
(Intercept) -0.0001 0.0014 -0.0757 039397 
cl 0.7919 0.0073 107.9063 0.0000 


Regression Diagnostics: 
R-Squared 0.8253 

Adjusted R-Squared 0.8253 

Durbin-Watson Stat 1.6456 


Residual Diagnostics: 
Stat P-Value 
Jarque-Bera 1644.6146 0.0000 
Ljung-Box 230.0477 0.0000 


Residual standard error: 0.06897 on 2464 degrees of freedom 


Q 


> summary (reg.fit,correction="white") % Use HC the estimator 
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Coefficients: 
Value Std. Error t value Pr(>|t|) 
(Intercept) -0.0001 0.0014 -0.0757 0.9396 
c1 0.7919 0.0163 48.4405 0.0000 


> summary (reg.fit,correction="nw") % Use the HAC estimator 


Coefficients: 
Value Std. Error t value Pr(>\t|) 
(Intercept) -0.0001 0.0016 -0.0678 0.9459 
cl 0.7919 0.0198 39.9223 0.0000 


% Below, fit a regression model with time series errors. 

> reg. ts=OLS (c3°cl1+tslag(c3,1)+tslag(c1,1),na.rm=T) 

> summary (reg.ts) 

Call: 

OLS (formula = c3 ~ cl + tslag(c3, 1)+tslag(cl, 1), na.rm = T) 


Residuals: 
Min 10 Median 30 Max 
-0.4481 -0.0355 -0.0008 0.0341 0.4582 


Coefficients: 
Value Std. Error t value Pr(>|t|) 
(Intercept) -0.0001 0.0014 -0.0636 0.9493 
e1 0.7971 0.0077 103.6320 0.0000 
tslag(c3, 1) 0.1766 0.0198 8.9057 0.0000 
tslag(c1, 1) -0.1580 0.0174 -9.0583 0.0000 


Regression Diagnostics: 

R-Squared 0.8312 
Adjusted R-Squared 0.8310 
Durbin-Watson Stat 1.9865 


Residual Diagnostics: 

Stat P-Value 
Jarque-Bera 1620.5090 0.0000 
Ljung-Box 131.6048 0.0000 


Residual standard error: 0.06785 on 2461 degrees of freedom 


Let B j be the jth element of B. When k>1, the HC variance of B jm 
Eq. (2.49) can be obtained by using an auxiliary regression. Let x—;j„ be the 
(k — 1)-dimensional vector obtained by removing the element xj; from xz. 
Consider the auxiliary regression 


Xjt =X py + Ur, t=1,...,T. (2.51) 
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Let 0; be the least-squares residual of this auxiliary regression. It can be shown 
that 


Var(B uc = ==", 


where é, is the residual of original regression in Eq. (2.48). The auxiliary regression 
is simply a step taken to achieve orthogonality between Ô; and the rest of the 
regressors so that the formula in Eq. (2.49) can be simplified. 


2.11 LONG-MEMORY MODELS 


We have discussed that for a stationary time series the ACF decays exponentially to 
zero as lag increases. Yet for a unit-root nonstationary time series, it can be shown 
that the sample ACF converges to | for all fixed lags as the sample size increases; 
see Chan and Wei (1988) and Tiao and Tsay (1983). There exist some time series 
whose ACF decays slowly to zero at a polynomial rate as the lag increases. These 
processes are referred to as long-memory time series. One such example is the 
fractionally differenced process defined by 


(1 — B)4x, =a, —0.5 < d < 0.5, (2.52) 


where {ar} is a white noise series. Properties of model (2.52) have been widely 
studied in the literature (e.g., Hosking, 1981). We now summarize some of these 
properties: 


1. If d < 0.5, then x; is a weakly stationary process and has the infinite MA 
representation 


[0.6] 
=a +Y Wid; with Y 
{=l 
_ dt+d)-.-&-1+d)_ (&+d-1)! 
~ k! ~ kMd—1)! ` 


2. If d > —0.5, then x; is invertible and has the infinite AR representation 


lo) 
x= So mimi +a with 7k 
i=l 
_ -d(l—d)---(k-1-d) (k-d-1)! 
E k! ~ k(=d- 1)! 
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3. For —0.5 < d < 0.5, the ACF of x; is 


jE, ee 
(l—d)(2—d)---(k—d) 


In particular, p; = d/(1 — d) and 
—d)! 
Pe & en as k—> oo. 


4. For —0.5 < d < 0.5, the PACF of x; is kk = d/(k — d) for k =1,2,.... 
5. For —0.5 < d < 0.5, the spectral density function f (œ) of x,, which is the 
Fourier transform of the ACF of x+, satisfies 


flo) ~ao 4, as oO, (2.53) 
where w € [0, 2x ] denotes the frequency. 


Of particular interest here is the behavior of ACF of x; when d < 0.5. The property 
says that pp ~ k?4—!, which decays at a polynomial, instead of exponential, rate. 
For this reason, such an x; process is called a long-memory time series. A special 
characteristic of the spectral density function in Eq. (2.53) is that the spectrum 
diverges to infinity as w — 0. However, the spectral density function of a stationary 
ARMA process is bounded for all w € [0, 27]. 

Earlier we used the binomial theorem for noninteger powers 


ie ef d \ uk d\ d(d-1)---d-k+1) 
a-a =De ( fat oO—_—_ 


If the fractionally differenced series (1 — B)“x, follows an ARMA(p, g) model, 
then x, is called an ARFIMA(p, d, q) process, which is a generalized ARIMA 
model by allowing for noninteger d. 

In practice, if the sample ACF of a time series is not large in magnitude, 
but decays slowly, then the series may have long memory. As an illustration, 
Figure 2.22 shows the sample ACFs of the absolute series of daily simple returns 
for the CRSP value- and equal-weighted indexes from January 2, 1970, to Decem- 
ber 31, 2008. The ACFs are relatively small in magnitude but decay very slowly; 
they appear to be significant at the 5% level even after 300 lags. For more informa- 
tion about the behavior of sample ACF of absolute return series, see Ding, Granger, 
and Engle (1993). For the pure fractionally differenced model in Eq. (2.52), one 
can estimate d using either a maximum-likelihood method or a regression method 
with logged periodogram at the lower frequencies. Finally, long-memory models 
have attracted some attention in the finance literature in part because of the work 
on fractional Brownian motion in the continuous-time models. 
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Figure 2.22 Sample autocorrelation function of absolute series of daily simple returns for CRSP 
value- and equal-weighted indexes: (a) value-weighted index return and (b) equal-weighted index return. 
Sample period is from January 2, 1970, to December 31, 2008. 


APPENDIX: SOME SCA COMMANDS 


In this appendix, we give the SCA commands used in Section 2.9. The 1-year 
maturity interest rates are in the file w-gslyr.txt and the 3-year rates are in the 
file w-gs3yr.txt. 


-- load the data into SCA, denote the data by ratel and rate3. 

input year,mom,day,ratel. file ‘w-gslyr.txt’ 

input year,mon,day,rate3. file ‘w-gs3yr.txt’ 

-- specify a simple linear regression model. 

tsm ml. model rate3=b0+(b1)ratel+noise. 

-- estimate the specified model and store residual in r1. 

estim ml. hold resi(r1). 

-- compute 10 lags of residual acf. 

acf rl. maxi 10. 

-- difference the two series, denote the new series by clt 
and c3t 

diff old ratel,rate3. new clt, c3t. compress. 

-- specify a linear regression model for the differenced data 

tsm m2. model c3t=h0+(h1)clt+noise. 

-- estimation 

estim m2. hold resi(r2). 

-- compute residual acf. 
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acf r2. maxl 10. 

-- specify a regression model with time series errors. 
tsm m3. model c3t=g0+(gl1)clt+(1)noise. 

-- estimate the model using the exact likelihood method. 
estim m3. method exact. hold resi(r3). 

-- compute residual acf. 

acf r3. maxl 10. 

-- refine the model to include more MA lags. 

tsm m4. model c3t=g0+(gl)cit+(1,4,6,7)noise. 

-- estimation 

estim m4. method exact. hold resi(r4). 

-- compute residual acf. 

acf r4. maxl 10. 

-- exit SCA 


stop 


EXERCISES 


If not specifically specified, use 5% significance level to draw conclusions in the 
exercises. 


2.1. 


2.2. 


2.3. 


Suppose that the simple return of a monthly bond index follows the MA(1) 
model 


R; = d; + 0.2a;—1, Oa = 0.025. 


Assume that dj99 = 0.01. Compute the 1-step- and 2-step-ahead forecasts 
of the return at the forecast origin t = 100. What are the standard devia- 
tions of the associated forecast errors? Also compute the lag-1 and lag-2 
autocorrelations of the return series. 


Suppose that the daily log return of a security follows the model 
r; = 0.01 + 0.27,;_2 + aş, 


where {a;} is a Gaussian white noise series with mean zero and variance 0.02. 
What are the mean and variance of the return series r,? Compute the lag-1 
and lag-2 autocorrelations of r;. Assume that rj99 = —0.01, and ro = 0.02. 
Compute the 1- and 2-step-ahead forecasts of the return series at the forecast 
origin tf = 100. What are the associated standard deviations of the forecast 
errors? 


Consider the monthly U.S. unemployment rate from January 1948 to March 
2009 in the file m-unrate.txt. The data are seasonally adjusted and 
obtained from the Federal Reserve Bank of St Louis. Build a time series 
model for the series and use the model to forecast the unemployment rate 
for the April, May, June, and July of 2009. In addition, does the fitted 
model imply the existence of business cycles? Why? (Note that there are 
more than one model fits the data well. You only need an adequate model.) 
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2.4. 


2.5. 


2.6. 


21. 


2.8. 


Consider the monthly simple returns of the Decile 1, Decile 2, Decile 9, and 
Decile 10 of NYSE/AMEX/NASDAQ based on market capitalization. The 
data span is from January 1970 to December 2008, and the data are obtained 
from CRSP. 


(a) For the return series of Decile 2 and Decile 10, test the null hypothesis 
that the first 12 lags of autocorrelations are zero at the 5% level. Draw 
your conclusion. 


(b) Build an ARMA model for the return series of Decile 2. Perform model 
checking and write down the fitted model. 

(c) Use the fitted ARMA model to produce 1- to 12-step-ahead forecasts of 
the series and the associated standard errors of forecasts. 

Consider the daily simple returns of IBM stock from 1970 to 2008 in the 

file d-ibm3dx7008.txt. Compute the first 100 lags of ACF of the absolute 

series of daily simple returns of IBM stock. Is there evidence of long-range 

dependence? Why? 

Consider the demand of electricity of a manufacturing sector in the United 

States. The data are logged, denote the demand of a fixed day of each month, 

and are in power6.txt. Build a time series model for the series and use the 

fitted model to produce 1- to 24-step-ahead forecasts. 


Consider the daily simple returns of IBM stock, CRSP value-weighted index, 
CRSP equal-weighted index, and the S&P composite index from January 
1980 to December 2008. The index returns include dividend distributions. 
The data file is d-ibm3dxwkdays8008.txt, which has 12 columns. The 
columns are (year, month, day, IBM, VW, EW, SP, M, T, W, H, F), where M, 
T, W, R, and F denotes indicator variables for Monday to Friday, respectively. 
Use a regression model to study the effects of trading days on the equal- 
weighted index returns. What is the fitted model? Are the weekday effects 
significant in the returns at the 5% level? Use the HAC estimator of the 
covariance matrix to obtain the f ratio of regression estimates. Does the 
HAC estimator change the conclusion of weekday effects? Are there serial 
correlations in the regression residuals? If yes, build a regression model with 
time series error to study weekday effects. 

Consider the data set of the previous question, but focus on the daily simple 
returns of the S&P composite index. Perform the necessary data analysis 
and statistical tests using the 5% significance level to answer the following 
questions: 


(a) Is there any weekday effect on the daily simple returns of the S&P com- 
posite index? You may employ a linear regression model to answer this 
question. Estimate the model, check its validity, and test the hypothesis 
that there is no Friday effect. Draw your conclusion. 

(b) Check the residual serial correlations using Q(12) statistic. Are there any 
significant serial correlations in the residuals? If yes, build a regression 
model with time series errors for the data. 
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2.10. 


2.11. 


2.12. 


2.13. 


2.14. 
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Now consider similar questions of the previous exercise for the IBM stock 

returns. 

(a) Is there any weekday effect on the daily simple returns of IBM stock? 
Estimate your model and test the hypothesis that there is no Friday effect. 
Draw your conclusion. 

(b) Are there serial correlations in the residuals? Use Q(12) to perform the 
test. Draw your conclusion. 

(c) Refine the above model by using the technique of regression model with 
time series errors. In there a significant weekday effect based on the 
refined model? 


Consider the weekly yields of Moody’s Aaa and Baa seasoned bonds from 
January 5, 1962, to April 10, 2009. The data are obtained from the Federal 
Reserve Bank of St Louis. Weekly yields are averages of daily yields. Obtain 
the summary statistics (sample mean, standard deviation, skewness, excess 
kurtosis, minimum, and maximum) of the two yield series. Are the bond 
yields skewed? Do they have heavy tails? Answer the questions using 5% 
significance level. 

Consider the monthly Aaa bond yields of the prior problem. Build a time 
series model for the series. 

Again, consider the two bond yield series, that is, Aaa and Baa. What is the 
relationship between the two series? To answer this question, build a time 
series model using yields of Aaa bonds as the dependent variable and yields 
of Baa bonds as independent variable. 

Consider the monthly log returns of CRSP equal-weighted index from Jan- 
uary 1962 to December 1999 for 456 observations. You may obtain the data 
from CRSP directly or from the file m-ew6299.txt on the Web. 

(a) Build an AR model for the series and check the fitted model. 

(b) Build an MA model for the series and check the fitted model. 


(c) Compute 1- and 2-step-ahead forecasts of the AR and MA models built 
in the previous two questions. 

(d) Compare the fitted AR and MA models. 

This problem is concerned with the dynamic relationship between the spot 
and futures prices of the S&P 500 index. The data file sp5may.dat has 
three columns: log(futures price), log(spot price), and cost-of-carry (x 100). 
The data were obtained from the Chicago Mercantile Exchange for the S&P 
500 stock index in May 1993 and its June futures contract. The time interval 
is 1 minute (intraday). Several authors used the data to study index futures 
arbitrage. Here we focus on the first two columns. Let f, and s; be the 
log prices of futures and spot, respectively. Consider y, = f; — f;-; and 
Xt = St — S;_1. Build a regression model with time series errors between {y+} 
and {x;}, with y, being the dependent variable. 
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2.15. The quarterly gross domestic product implicit price deflator is often used 
as a measure of inflation. The file q-gdpdef.txt contains the data for the 
United States from the first quarter of 1947 to the last quarter of 2008. Data 
format is year, month, day, and deflator. The data are seasonally adjusted 
and equal to 100 for year 2000. Build an ARIMA model for the series and 
check the validity of the fitted model. Use the fitted model to predict the 
inflation for each quarter of 2009. The data are obtained from the Federal 
Reserve Bank of St Louis. 
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CHAPTER 3 


Conditional Heteroscedastic Models 


The objective of this chapter is to study some methods and econometric models 
available in the literature for modeling the volatility of an asset return. The models 
are referred to as conditional heteroscedastic models. 

Volatility is an important factor in options trading. Here volatility means the 
conditional standard deviation of the underlying asset return. Consider, for example, 
the price of a European call option, which is a contract giving its holder the right, 
but not the obligation, to buy a fixed number of shares of a specified common 
stock at a fixed price on a given date. The fixed price is called the strike price 
and is commonly denoted by K. The given date is called the expiration date. The 
important time duration here is the time to expiration (measured in years), and we 
denote it by £. The well-known Black-Scholes option pricing formula states that 
the price of such a call option is 


In(P,/K)+ré 1 
cı = P,®(x) — Ke" d(x — ov), and gpa UVEA y ove, 
ao, Vb 
(3.1) 


where P, is the current price of the underlying stock, r is the continuously com- 
pounded risk-free interest rate, o; is the annualized conditional standard deviation 
of the log return of the specified stock, and ®(x) is the cumulative distribution 
function of the standard normal random variable evaluated at x. A derivation of 
the formula is given in Chapter 6. The formula has several nice interpretations, 
but it suffices to say here that the conditional standard deviation of the log return 
of the underlying stock plays an important role. This volatility evolves over time. 
If the holder can exercise her right any time on or before the expiration date, then 
the option is called an American call option. 

Volatility has many other financial applications. As discussed in Chapter 7, 
volatility modeling provides a simple approach to calculating value at risk of a 
financial position in risk management. It plays an important role in asset allocation 
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under the mean-variance framework. Furthermore, modeling the volatility of a 
time series can improve the efficiency in parameter estimation and the accuracy in 
interval forecast. Finally, the volatility index of a market has recently become a 
financial instrument. The VIX volatility index compiled by the Chicago Board of 
Option Exchange (CBOE) started to trade in futures on March 26, 2004. 

The univariate volatility models discussed in this chapter include the autoregres- 
sive conditional heteroscedastic (ARCH) model of Engle (1982), the generalized 
ARCH (GARCH) model of Bollerslev (1986), the exponential GARCH (EGARCH) 
model of Nelson (1991), the threshold GARCH (TGARCH) model of Glosten, 
Jagannathan, and Runkle (1993) and Zakoian (1994), the conditional heteroscedas- 
tic autoregressive moving-average (CHARMA) model of Tsay (1987), the random 
coefficient autoregressive (RCA) model of Nicholls and Quinn (1982), and the 
stochastic volatility (SV) models of Melino and Turnbull (1990), Taylor (1994), 
Harvey, Ruiz, and Shephard (1994), and Jacquier, Polson, and Rossi (1994). We 
also discuss advantages and weaknesses of each volatility model and show some 
applications of the models. Multivariate volatility models, including those with 
time-varying correlations, are discussed in Chapter 10. The chapter also discusses 
some alternative approaches to volatility modeling in Section 3.15, including use 
of daily high and low prices of an asset. 


3.1 CHARACTERISTICS OF VOLATILITY 


A special feature of stock volatility is that it is not directly observable. For example, 
consider the daily log returns of IBM stock. The daily volatility is not directly 
observable from the return data because there is only one observation in a trading 
day. If intraday data of the stock, such as 10-minute returns, are available, then one 
can estimate the daily volatility. See Section 3.15. The accuracy of such an estimate 
deserves a careful study, however. For example, stock volatility consists of intraday 
volatility and overnight volatility with the latter denoting variation between trading 
days. The high-frequency intraday returns contain only very limited information 
about the overnight volatility. The unobservability of volatility makes it difficult 
to evaluate the forecasting performance of conditional heteroscedastic models. We 
discuss this issue later. 

In options markets, if one accepts the idea that the prices are governed by an 
econometric model such as the Black—Scholes formula, then one can use the price 
to obtain the “implied” volatility. Yet this approach is often criticized for using a 
specific model, which is based on some assumptions that might not hold in practice. 
For instance, from the observed prices of a European call option, one can use the 
Black-Scholes formula in Eq. (3.1) to deduce the conditional standard deviation 
o,. The resulting value of o; is called the implied volatility of the underlying stock. 
However, this implied volatility is derived under the assumption that the price of 
the underlying asset follows a geometric Brownian motion. It might be different 
from the actual volatility. Experience shows that implied volatility of an asset return 
tends to be larger than that obtained by using a GARCH type of volatility model. 
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This might be due to the risk premium for volatility or to the way daily returns are 
calculated. The VIX of CBOE is an implied volatility. 

Although volatility is not directly observable, it has some characteristics that are 
commonly seen in asset returns. First, there exist volatility clusters (i.e., volatility 
may be high for certain time periods and low for other periods). Second, volatil- 
ity evolves over time in a continuous manner—that is, volatility jumps are rare. 
Third, volatility does not diverge to infinity—that is, volatility varies within some 
fixed range. Statistically speaking, this means that volatility is often stationary. 
Fourth, volatility seems to react differently to a big price increase or a big price 
drop, referred to as the leverage effect. These properties play an important role 
in the development of volatility models. Some volatility models were proposed 
specifically to correct the weaknesses of the existing ones for their inability to 
capture the characteristics mentioned earlier. For example, the EGARCH model 
was developed to capture the asymmetry in volatility induced by big “positive” 
and “negative” asset returns. 


3.2 STRUCTURE OF A MODEL 


Let r; be the log return of an asset at time index t. The basic idea behind volatility 
study is that the series {r;} is either serially uncorrelated or with minor lower order 
serial correlations, but it is a dependent series. For illustration, consider the monthly 
log stock returns of Intel Corporation from January 1973 to December 2008 shown 
in Figure 3.1. Figure 3.2(a) shows the sample ACF of the log return series, 
which suggests no significant serial correlations except for a minor one at lag 7. 
Figure 3.2(c) shows the sample ACF of the absolute log returns (i.e., |7;|), whereas 
Figure 3.2(b) shows the sample ACF of the squared log returns rž. These two plots 
clearly suggest that the monthly log returns are not serially independent. Combin- 
ing the three plots, it seems that the log returns are indeed serially uncorrelated but 
dependent. Volatility models attempt to capture such dependence in the return series. 

To put the volatility models in proper perspective, it is informative to consider 
the conditional mean and variance of r; given F;—1; that is, 


hi = EC |R-1), of = Var(ry|Fr-1) = E[r, — w) |i, (3.2) 


where F;_; denotes the information set available at time t — 1. Typically, F,_; 
consists of all linear functions of the past returns. As shown by the empirical 
examples of Chapter 2 and Figure 3.2, serial dependence of a stock return series r; 
is weak if it exists at all. Therefore, the equation for jz; in (3.2) should be simple, 
and we assume that r, follows a simple time series model such as a stationary 
ARMA(p, q) model with some explanatory variables. In other words, we entertain 
the model 


p q k 
rt = Ht + dt, hi => biy-i — J biari, Yr =r; — ho — È Bixit, (3.3) 
i=1 


i=l i=1 
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Figure 3.1 Time plot of monthly log returns of Intel stock from January 1973 to December 2008. 
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Figure 3.2 Sample ACF and PACF of various functions of monthly log stock returns of Intel Corpo- 
ration from January 1973 to December 2008: (a) ACF of the log returns, (b) ACF of the squared log 
returns, (c) ACF of the absolute log returns, and (d) PACF of the squared log returns. 
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for r;, where k, p, and q are nonnegative integers, and x;,; are explanatory variables. 
Here y; is simply a notation representing the adjusted return series after removing 
the effect of explanatory variables. 

Model (3.3) illustrates a possible financial application of the regression model 
with time series errors of Chapter 2. The order (p,q) of an ARMA model may 
depend on the frequency of the return series. For example, daily returns of a market 
index often show some minor serial correlations, but monthly returns of the index 
may not contain any significant serial correlation. The explanatory variables x, in 
Eq. (3.3) are flexible. For example, a dummy variable can be used for the Mondays 
to study the effect of the weekend on daily stock returns. In the capital asset pricing 
model (CAPM), the mean equation of r, can be written as r; = ho + Brn + at, 
where 7,,,, denotes the market return. 

Combining Eqs. (3.2) and (3.3), we have 


of = Var(r;| Fi—-1) = Var(a;|F;-1). (3.4) 


The conditional heteroscedastic models of this chapter are concerned with the 
evolution of o7. The manner under which g? evolves over time distinguishes one 
volatility model from another. 

Conditional heteroscedastic models can be classified into two general categories. 
Those in the first category use an exact function to govern the evolution of ož, 
whereas those in the second category use a stochastic equation to describe o7. 
The GARCH model belongs to the first category, whereas the stochastic volatility 
model is in the second category. 

Throughout the book, a, is referred to as the shock or innovation of an asset 
return at time ¢ and o; is the positive square root of of. The model for u, in Eq. 
(3.3) is referred to as the mean equation for r, and the model for of is the volatility 
equation for r;. Therefore, modeling conditional heteroscedasticity amounts to aug- 
menting a dynamic equation, which governs the time evolution of the conditional 
variance of the asset return, to a time series model. 


3.3 MODEL BUILDING 


Building a volatility model for an asset return series consists of four steps: 


1. Specify a mean equation by testing for serial dependence in the data and, if 
necessary, building an econometric model (e.g., an ARMA model) for the 
return series to remove any linear dependence. 


2. Use the residuals of the mean equation to test for ARCH effects. 


3. Specify a volatility model if ARCH effects are statistically significant, and 
perform a joint estimation of the mean and volatility equations. 


4. Check the fitted model carefully and refine it if necessary. 
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For most asset return series, the serial correlations are weak, if any. Thus, 
building a mean equation amounts to removing the sample mean from the data if 
the sample mean is significantly different from zero. For some daily return series, a 
simple AR model might be needed. In some cases, the mean equation may employ 
some explanatory variables such as an indicator variable for weekend or January 
effects. 

In what follows, we use R (both with and without OX) and S-Plus in empirical 
illustrations. Other software packages (e.g., Eviews, SCA, and RATS) can also be 
used. 


3.3.1 Testing for ARCH Effect 


For ease in notation, let a, = r; — ur be the residuals of the mean equation. The 
squared series a? is then used to check for conditional heteroscedasticity, which 
is also known as the ARCH effects. Two tests are available. The first test is to 
apply the usual Ljung—Box statistics Q(m) to the {a7} series; see McLeod and Li 
(1983). The null hypothesis is that the first m lags of ACF of the a? series are zero. 
The second test for conditional heteroscedasticity is the Lagrange multiplier test 
of Engle (1982). This test is equivalent to the usual F statistic for testing a; = 0 
(i = 1,...,m) in the linear regression 


2 2 2 
a; = Ag + Oa;_) +++ + Ama; m + er, t=m+1,...,T, 


where e; denotes the error term, m is a prespecified positive integer, and T is 
the sample size. Specifically, the null hypothesis is Hp : a; =--- = a», = 0. Let 
SSRo = Saar — @)?, where = (1/T) X}; a? is the sample mean of a?, 
and SSR; = Sn 41> where ê, is the least-squares residual of the prior linear 
regression. Then we have 


___ (SSRo — SSR1)/m 
~ SSR,/(T — 2m — 1)’ 


which is asymptotically distributed as a chi-squared distribution with m degrees of 
freedom under the null hypothesis. The decision rule is to reject the null hypothesis 
if F> x (a), where x2 (a) is the upper 100(1 — œ)th percentile of x or the 
p value of F is less than a, type-I error. 

To demonstrate, we consider the monthly log stock returns of Intel Corporation 
from 1973 to 2008; see Example 3.1. The series does not have significant serial 
correlations so that it can be directly used to test for the ARCH effect. Indeed, 
the Q(m) statistics of the return series give Q(12) = 18.26 with a p value of 
0.11, confirming no serial correlations in the data. On the other hand, the Lagrange 
multiplier test shows strong ARCH effects with test statistic F ~ 53.62, the p 
value of which is close to zero. The Ljung—Box statistics of the a; series also 
shows strong ARCH effects with Q(12) = 89.85, the p value of which is close to 
zero. 
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Denote the return series by intc. Note that the command archTest applies 


directly to the a; series, not to a. 


> da=read.table("m-intc7308.txt",header=T) 
> intc=log(da[,2]+1) 

> autocorTest (intc, lag=12) 

Test for Autocorrelation: Ljung-Box 

Null Hypothesis: no autocorrelation 


Test Statistics: 
Test Stat 18.2635 p.value 0.1079 


Dist. under Null: chi-square with 12 degrees of freedom 
Total Observ.: 432 


> archTest (intc, lag=12) 
Test for ARCH Effects: LM Test 
Null Hypothesis: no ARCH effects 


Test Statistics: 
Test Stat 53.6197 p.value 0.0000 


Dist. under Null: chi-square with 12 degrees of freedom 


R Demonstration 


> da=read.table("m-intc7308.txt",header=T) 
> intc=log(da[,2]+1) 

> Box.test(intc, lag=12,type=’Ljung’ ) 
Box-Ljung test 


data: intc 
X-squared = 18.2635, df = 12, p-value = 0.1079 


> at=intc-mean(intc) 
> Box.test(at*2,lag=12,type=’Ljung’ ) 
Box-Ljung test 


data: at%*2 
X-squared = 89.8509, df = 12, p-value = 5.274e-14 


3.4 THE ARCH MODEL 


The first model that provides a systematic framework for volatility modeling 
is the ARCH model of Engle (1982). The basic idea of ARCH models is that 
(a) the shock a; of an asset return is serially uncorrelated, but dependent, and 
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(b) the dependence of a; can be described by a simple quadratic function of its 
lagged values. Specifically, an ARCH(m) model assumes that 


2 2 2 
ay = Of, Of = do + 0147) +--+ + Amam: (3.5) 


where {e;} is a sequence of independent and identically distributed (iid) random 
variables with mean zero and variance 1, œọ > 0, and a; > 0 for i > 0. The coeffi- 
cients œ; must satisfy some regularity conditions to ensure that the unconditional 
variance of a; is finite. In practice, €; is often assumed to follow the standard 
normal or a standardized Student-r or a generalized error distribution. 

From the structure of the model, it is seen that large past squared shocks {ars Hie 
imply a large conditional variance ož for the innovation a;. Consequently, a, tends 
to assume a large value (in modulus). This means that, under the ARCH framework, 
large shocks tend to be followed by another large shock. Here I use the word 
tend because a large variance does not necessarily produce a large realization. It 
only says that the probability of obtaining a large variate is greater than that of 
a smaller variance. This feature is similar to the volatility clusterings observed in 
asset returns. 

The ARCH effect also occurs in other financial time series. Figure 3.3 shows 
the time plots of (a) the percentage changes in Deutsche mark/U.S. dollar exchange 
rate measured in 10-minute intervals from June 5, 1989, to June 19, 1989, for 2488 
observations, and (b) the squared series of the percentage changes. Big percentage 
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Figure 3.3 (a) Time plot of 10-minute returns of exchange rate between Deutsche mark and U.S. 
dollar from June 5, 1989, to June 19, 1989, and (b) the squared returns. 
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Figure 3.4 (a) Sample autocorrelation function of return series of mark/dollar exchange rate and (b) 
sample partial autocorrelation function of squared returns. 


changes occurred occasionally, but there were certain stable periods. Figure 3.4(a) 
shows the sample ACF of the percentage change series. Clearly, the series has no 
serial correlation. Figure 3.4(b) shows the sample PACF of the squared series of 
percentage change. It is seen that there are some big spikes in the PACF. Such 
spikes suggest that the percentage changes are not serially independent and have 
some ARCH effects. 


Remark. Some authors use h; to denote the conditional variance in Eq. (3.5). 
In this case, the shock becomes a; = y hrer. 


3.4.1 Properties of ARCH Models 
To understand the ARCH models, it pays to carefully study the ARCH(1) model 


2 2 
at = Ores, Of = Q0 + Q14;_}, 
where go > 0 and a > 0. First, the unconditional mean of a; remains zero because 


E(a;) = E[E(a;|Fr-1)] = Elo; E(e+)] = 0. 
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Second, the unconditional variance of a; can be obtained as 
2 2 
Var(a;) = E (af) = ELE (a; |Fi-1)] 
2 2 
= E («o + a a7_}) = a + a E (af_1). 

Because a; is a stationary process with E(a;) = 0, Var(a;) = Var(a;—1) = E(a?_,). 
Therefore, we have Var(a;) = a + a; Var(a;) and Var(a;) = œo/(1 — a). Since the 
variance of a; must be positive, we require 0 < a, < 1. Third, in some applications, 
we need higher order moments of a; to exist and, hence, a; must also satisfy some 
additional constraints. For instance, to study its tail behavior, we require that the 


fourth moment of a; is finite. Under the normality assumption of e; in Eq. (3.5), 
we have 


E (a7 |F,-1) = 31E (a; |F- DI? = 3(a0 + @1a7 1)”. 
Therefore, 
E(a?) = E[E(af|F,—1)] = 3E (œo + œa?) =3E (ag + ayaa? + alaf) ; 
If a; is fourth-order stationary with m4 = E (af ), then we have 


m4 = Blag + 2aga, Var(a;) + a ma] 


a) 


= 3a (142, ) + Sams 


— oj 
Consequently, 


3a +1) 
mı = ———_—__.. 
(I — 0) (1 30) 
This result has two important implications: (a) since the fourth moment of a; 
is positive, we see that a; must also satisfy the condition 1 — 3a? > 0; that is, 
1; 


0< a? < 3; and (b) the unconditional kurtosis of a; is 


Vara) “d—-a)d—3a) of 1—3a7 


E(a?) a(d + a1) d-a) _ 4 1-a? 2% 


Thus, the excess kurtosis of a; is positive and the tail distribution of a; is heavier 
than that of a normal distribution. In other words, the shock a, of a conditional 
Gaussian ARCH(1) model is more likely than a Gaussian white noise series to 
produce “outliers.” This is in agreement with the empirical finding that “outliers” 
appear more often in asset returns than that implied by an iid sequence of normal 
random variates. 

These properties continue to hold for general ARCH models, but the formulas 
become more complicated for higher order ARCH models. The condition œ; > 0 in 
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Eq. (3.5) can be relaxed. It is a condition to ensure that the conditional variance oF 
is positive for all ż. In fact, a natural way to achieve positiveness of the conditional 


variance is to rewrite an ARCH(m) model as 
2 
dt = 0Ot6t, 0, = ao =e Ant—-12Amt-15 (3.6) 


where Am:—1 = (Q)-1,---,@—m)’ and Q is an m x m nonnegative definite matrix. 
The ARCH(m) model in Eq. (3.5) requires Q to be diagonal. Thus, Engle’s model 
uses a parsimonious approach to approximate a quadratic function. A simple way to 
achieve Eq. (3.6) is to employ a random-coefficient model for a,; see the CHARMA 
and RCA models discussed later. 


3.4.2 Weaknesses of ARCH Models 


The advantages of ARCH models include properties discussed in the previous 
section. The model also has some weaknesses: 


1. The model assumes that positive and negative shocks have the same effects 
on volatility because it depends on the square of the previous shocks. In 
practice, it is well known that the price of a financial asset responds differently 
to positive and negative shocks. 

2. The ARCH model is rather restrictive. For instance, a? of an ARCH(1) model 
must be in the interval [0, 1] if the series has a finite fourth moment. The 
constraint becomes complicated for higher order ARCH models. In practice, 
it limits the ability of ARCH models with Gaussian innovations to capture 
excess kurtosis. 

3. The ARCH model does not provide any new insight for understanding the 
source of variations of a financial time series. It merely provides a mechanical 
way to describe the behavior of the conditional variance. It gives no indication 
about what causes such behavior to occur. 

4. ARCH models are likely to overpredict the volatility because they respond 
slowly to large isolated shocks to the return series. 


3.4.3 Building an ARCH Model 


Among volatility models, specifying an ARCH model is relatively easy. Details 
are given below. 


Order Determination 

If an ARCH effect is found to be significant, one can use the PACF of a? to 
determine the ARCH order. Using PACF of a? to select the ARCH order can be 
justified as follows. From the model in Eq. (3.5), we have 


2 2 2 
of = Qo + Aja, ) +++ + Amai m: 
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For a given sample, a? is an unbiased estimate of oF Therefore, we expect that a? 
is linearly related to a?_,,...,a7_,, in a manner similar to that of an autoregressive 
model of order m. Note that a single a? is generally not an efficient estimate of 
o, but it can serve as an approximation that could be informative in specifying 
the order m. 

Alternatively, define n; = a? — o7. It can be shown that {7;} is an uncorrelated 


series with mean 0. The ARCH model then becomes 
Po 2 2 
a, = Q0 + 1441 Fe + Amam +, 


which is in the form of an AR(m) model for a?, except that {7;} is not an iid series. 
From Chapter 2, PACF of a? is a useful tool to determine the order m. Because 
{m} are not identically distributed, the least-squares estimates of the prior model 
are consistent but not efficient. The PACF of a? may not be effective when the 
sample size is small. 


Estimation 
Several likelihood functions are commonly used in ARCH estimation, depending on 


the distributional assumption of €,;. Under the normality assumption, the likelihood 
function of an ARCH(m) model is 


f(a, ...arlæ) = f(ar|Fr—1) f (@r—1|Fr—2) ++ > f (am+1|Fm) f (G1, +++ amla) 


T 2 
= I] ee exp (-) x f(a am|æ) 
202 A m ’ 


2 
wd 210; t 
where œ = (a, Q1,...,Q@m)’ and f(a),...,@m|a) is the joint probability density 
function of a1, ..., am. Since the exact form of f (a1, ...,aml|æ) is complicated, 


it is commonly dropped from the prior likelihood function, especially when the 
sample size is sufficiently large. This results in using the conditional-likelihood 
function 


Z 1 a? 

Am41,-++, AT |, 1,...,Am) = ——— exp | -— }. 
f m+1 F 1 m) T Janae p( a) 
where o? can be evaluated recursively. We refer to estimates obtained by maximiz- 
ing the prior likelihood function as the conditional maximum-likelihood estimates 
(MLEs) under normality. 

Maximizing the conditional-likelihood function is equivalent to maximizing its 
logarithm, which is easier to handle. The conditional log-likelihood function is 


T 


a ) > | L nOr) — = In(o2) 5] 
Gm+1,-+---,aT|@,a1,...,aAm) = —7 Mz) — = Into; =z a 
t=m+1 2 2 2 o; 
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Since the first term In(27) does not involve any parameters, the log-likelihood 
function becomes 


T 2 
y 2 | a; 

L(GQm41,--+,4T æ, d1, ..., Am) = — = In(of) + => > 
t=m+1 li 


where o? = œo + œ1a? | +--+ Oma? „ can be evaluated recursively. 

In some applications, it is more appropriate to assume that e, follows a 
heavy-tailed distribution such as a standardized Student-t distribution. Let x, be a 
Student-t distribution with v degrees of freedom. Then Var(x,) = v/(v — 2) for 
v>2, and we use €; = x,/./v/(v — 2). The probability density function of €; is 


flelv) = 


ree Dal (yy EN aa a 


T(v/2)/(u — 2) v—2 


where I(x) is the usual gamma function (i.e., T(x) = i y*le-Y dy). Using 
at = 0;€;, we obtain the conditional-likelihood function of a; as 


T —(v+1)/2 
T[w+1)/2] 1 a? 
St (Qn+1,---,a7|@, Am) = I] _ Fis DPI 1 — f 
Fae se T (v/2)y (v — 2) o (v — 2)o; 
where v > 2 and Am = (a1, dz, ..., Gm). We refer to the estimates that maximize the 


prior likelihood function as the conditional MLEs under f distribution. The degrees 
of freedom of the ¢f distribution can be specified a priori or estimated jointly with 
other parameters. A value between 4 and 8 is often used if it is prespecified. 

If the degrees of freedom v of the Student-r distribution is prespecified, then 
the conditional log-likelihood function is 


T 
v+1 a? 1 5 
Elam+1, +--+, @r|e, Am) = — 5 = In (1 F >] +F 3 ino?) . 
t=m+1 B t 
(3.8) 


If one wishes to estimate v jointly with other parameters, then the log-likelihood 
function becomes 


Wikies ar|æ, V, Am) = (T — m) {i [r ( i =) i [r (5)] 


— 0.5In[(v — 2m] + Llam+1, -- -, AT |Æ, Am), 


where the second term is given in Eq. (3.8). 
Besides fat tails, empirical distributions of asset returns may also be skewed. 
To handle this additional characteristic of asset returns, the Student-r distribution 
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has been modified to become a skew-Student-t distribution. There are multiple ver- 
sions of skew-Student-t distribution, but we shall adopt the approach of Fernandez 
and Steel (1998), which can introduce skewness into any continuous unimodal and 
symmetric (with respect to 0) univariate distribution. Specifically, for the innova- 
tion €; of an ARCH process, Lambert and Laurent (2001) apply the Fernandez and 
Steel method to the standardized Student-r distribution in Eq. (3.7) to obtain a stan- 
dardized skew-Student-t distribution. The resulting probability density function is 


E+ TOflE(@er +@)|v] if e < —@/o, 
3 


2 
zp pei lee +o)/Elv] if € > —O/e, 
E 


g(él&,v) = (3.9) 


where f(-) is the probability density function (pdf) of the standardized Student-t 
distribution in Eq. (3.7), € is the skewness parameter, v >2 is the degrees of 
freedom, and the parameters ọ and @ are given below: 


T[(v — 1)/2]Vv — 2 1 2 >, | 2 
g= — n E=) a +a Sor. 
[vaT (v/2) E E 
In Eq. (3.9), £? is equal to the ratio of probability masses above and below the 
mode of the distribution and, hence, it is a measure of the skewness. 
Finally, €e, may assume a generalized error distribution (GED) with probability 


density function 


v exp(—$|x/A]”) 


Dery O Sxvse, GN) 


fœ) = 


where T (-) is the gamma function and A = [2-2/9 P(1/v)/ T(3/v)]!/. This dis- 
tribution reduces to a Gaussian distribution if v = 2, and it has heavy tails when 
v < 2. The conditional log-likelihood function €(@y41,...,a7|@, Am) can easily 
be obtained. 


Remark. Skew Student-t, skew normal, and skew GED distributions are avail- 
able in the £Garch package of Rmetrics. The commands are sstd, snorm, and 
sged, respectively. See the R demonstration below for an example. 


Model Checking 
For a properly specified ARCH model, the standardized residuals 
x at 
t = — 
Ot 
form a sequence of iid random variables. Therefore, one can check the adequacy 
of a fitted ARCH model by examining the series {a;}. In particular, the Ljung—Box 
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statistics of & can be used to check the adequacy of the mean equation and that 
of a can be used to test the validity of the volatility equation. The skewness, 
kurtosis, and quantile-to-quantile plot (i.e., QQ plot) of {a,;} can be used to check 
the validity of the distribution assumption. Many residual plots are available in 
S-Plus for model checking. 


Forecasting 

Forecasts of the ARCH model in Eq. (3.5) can be obtained recursively as those 
of an AR model. Consider an ARCH(m) model. At the forecast origin h, the 
1-step-ahead forecast of of 41 ÍS 


o1) = Qo + ajag +--+ Onani n 
The 2-step-ahead forecast is 
Op (2) = æo + ayo (1) + aa} + +++ H Amah 49 ms 


and the £-step-ahead forecast for oa; ‘ye is 


m 


o; (€) =ao+ X ajo; (li), (3.11) 
i=1 


where ot -i= ATE if2—i <0. 


3.4.4 Some Examples 


In this section, we illustrate ARCH modeling by considering two examples. 


Example 3.1. We first apply the modeling procedure to build a simple ARCH 
model for the monthly log returns of Intel stock. The sample ACF and PACF 
of the squared returns in Figure 3.2 clearly show the existence of conditional 
heteroscedasticity. This is confirmed by the ARCH test shown in Section 3.3.1, 
and we proceed to identify the order of an ARCH model. The sample PACF in 
Figure 3.2(d) indicates that an ARCH(3) model might be appropriate. Consequently, 
we specify the model 


2 2 2 2 
re = Uta, ar = OE}, Of = Q0 + Q1a,_) + A24;_7 + A34;_3 


for the monthly log returns of Intel stock. Assuming that €; are iid standard normal, 
we obtain the fitted model 


r, =0.0122+a,, 67 =0.0106+0.2131a7_, + 0.0770a7_, + 0.0599a7_,, 


where the standard errors of the parameters are 0.0057, 0.0010, 0.0757, 0.0480, 
and 0.0688, respectively; see the output below. While the estimates meet the gen- 
eral requirement of an ARCH(3) model, the estimates of a2 and a3 appear to be 
statistically nonsignificant at the 5% level. Therefore, the model can be simplified. 
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S-Plus Demonstration 
The following output has been edited and % marks explanation: 


> module(finmetrics) 

> da=read.table("m-intc7308.txt",header=T) 

> intc=log(da[,2]+1) 

> arch3.fit=garch(inte~l1,~garch(3,0) ) 

> summary (arch3. fit) 

garch(formula.mean = intc ~ 1, formula.var = ~ garch(3, 0)) 


Mean Equation: structure(.Data = intc ~ 1, class = "for- 


Conditional Variance Equation:structure(.Data=~garch(3,0),..) 
Conditional Distribution: gaussian 


Value Std.Error t value Pr(>|t]) 


C 0.01216 0.0056986 2.1341 0.033402 
A 0.01058 0.0009643 10.9739 0.000000 
ARCH(1) 0.21307 0.0756708 2.8157 0.005093 
ARCH(2) 0.07698 0.0480170 1.6032 0.109638 
ARCH(3) 0.05988 0.0688081 0.8703 0.384628 


> archl=garch(intce~1,~garch(1,0) ) 
> summary (arch1) 
garch(formula.mean = intc ~ 1, formula.var = ~ garch(1, 0)) 


Conditional Distribution: gaussian 


Value Std.Error t value Pr(s>|t]) 

C 0.01261 0.0052624 2.397 1.695e-02 

A 0.01113 0.0009971 11.164 0.000e+00 
ARCH(1) 0.35602 0.0761267 4.677 3.912e-06 


Statistic P-value Chi*2-d.f. 
14.26 0.2844 12 


Ljung-Box test for squared standardized residuals: 


Statistic P-value Chi*2-d.f£. 
32.11 0.001329 12 
> stres=archi$Sresiduals/archl$sigma.t standardized residuals 
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> autocorTest (stres, lag=10) 
Test for Autocorrelation: Ljung-Box 


Null Hypothesis: no autocorrelation 
Statistics: 
Stat 12.6386, p.value 0.2446 


Dist. under Null: chi-square with 10 degrees of freedom 
> archTest (stres, lag=10) 
Test for ARCH Effects: LM Test 


Null Hypothesis: no ARCH effects 
Statistics: 
Stat 14.7481, p.value 0.1415 


Dist. under Null: chi-square with 10 degrees of freedom 
> archiSasymp.sd %Obtain unconditional standard error 


[1] 0.1314698 
> plot(archl) % Obtain various plots, including the 
fitted volatility series. 


oe 


Dropping the two nonsignificant parameters, we obtain the model 
r, = 0.0126 +a, 07 = 0.0111 + 0.3560a7_,, (3.12) 


where the standard errors of the parameters are 0.0053, 0.0010, and 0.0761, respec- 
tively. All the estimates are highly significant. Figure 3.5 shows the standardized 
residuals {a,} and the sample ACF of some functions of {a,}. The Ljung—Box 
statistics of standardized residuals give Q(10) = 12.64 with a p value of 0.24 
and those of {a7} give Q(10) = 14.75 with a p value of 0.14. See the output. 
Consequently, the ARCH(1) model in Eq. (3.12) is adequate for describing the 
conditional heteroscedasticity of the data at the 5% significance level. 

The ARCH(1) model in Eq. (3.12) has some interesting properties. First, the 
expected monthly log return for Intel stock is about 1.26%, which is remarkable, 
especially since the data span includes the period after the Internet bubble. Sec- 
ond, ar = 0.3562 < i so that the unconditional fourth moment of the monthly 
log return of Intel stock exists. Third, the unconditional standard deviation of r; 
is /0.0111/C1 — 0.356) ~ 0.1315. Finally, the ARCH(1) model can be used to 
predict the monthly volatility of Intel stock returns. 


t Innovation 
For comparison, we also fit an ARCH(1) model with Student-t innovations to the 


series. The resulting model is 


r, = 0.0169 +a, oÊ = 0.0120 + 0.2845a?_,, (3.13) 
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Figure 3.5 Model checking statistics of Gaussian ARCH(1) model in Eq. (3.12) for monthly log 
returns of Intel stock from January 1973 to December 2008: Parts (a), (b), and (c) show sample ACF 
of standardized residuals, their squared series, and absolute series, respectively; part (d) is time plot of 
standardized residuals. 


where the standard errors of the parameters are 0.0053, 0.0017, and 0.1120, respec- 
tively. The estimated degrees of freedom is 6.01 with standard error 1.50. All the 
estimates are significant at the 5% level. The unconditional standard deviation 
of a; is ./0.0120/C1 — 0.2845) ~ 0.1295, which is close to that obtained under 
normality. The Ljung—Box statistics of the standardized residuals give Q(12) = 
14.88 with a p value of 0.25, confirming that the mean equation is adequate. 
However, the Ljung—Box statistics for the squared standardized residuals show 
Q(12) = 35.42 with a p value of 0.0004. The volatility equation is inadequate 
at the 1% level. Further analysis shows that Q(10) = 15.90 with a p value 
of 0.10 for the squared standardized residuals. The inadequancy of the volatility 
equation is due to a large lag-12 ACF (p12 = 0.188) of the squared standardized 
residuals. 

Comparing models (3.12) and (3.13), we see that (a) using a heavy-tailed distri- 
bution for €, reduces the ARCH coefficient, and (b) the difference between the two 
models is small for this particular instance. Finally, a more appropriate conditional 
heteroscedastic model for the monthly log returns of Intel stock is a GARCH(1,1) 
model, which is discussed in the next section. 
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S-Plus Demonstration 
Note the following output with ¢ innovations: 


> archit=garch(intce~l1,~garch(1,0),cond.dist="t") 

> summary (archit) 

Call: 

garch(formula.mean=intce~1, formula.var=~garch(1,0), 
cond.dist="t") 


Mean Equation: structure(.Data = intc ~ 1, class = "formula") 
Cond. Variance Equation:structure(.Data=~ garch(1,0), ...) 
Cond. Distribution: t 

with estimated parameter 6.012769 and standard error 1.502179 


Value Std.Error t value Pr(>|t]) 

C 0.01688 0.005288 3.193 1.512e-03 

A 0.01195 0.001667 7.169 3.345e-12 
ARCH(1) 0.28445 0.111998 2.540 1.145e-02 


AIC(4) = -597.3379, BIC(4) = -581.0642 


Ljung-Box test for standardized residuals: 


Statistic P-value Chi*2-d.f£. 
14.88 0.2482 12 


Ljung-Box test for squared standardized residuals: 


Statistic P-value Chi*2-d.f£. 
35.42 0.0004014 12 


Remark. In S-Plus, the command garch allows for several conditional distri- 
butions. They are specified by cond.dist = ‘'t’’ or ‘‘ged’’. The default 
is Gaussian. The R output is given below. The estimates are close to those of 
S-Plus. 


R Demonstration 
The following output uses the fGarch package with command garchFit and % 
denotes explanation: 


da=read.table("m-intc7308.txt",header=T) 
library (£Garch) % Load the package 
intc=log(da[,2]+1) 

ml=garchFit (inte~garch(1,0),data=intc, trace=F) 


Q 


summary (m1) % Obtain results 


VVVVYV 
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Title: 

GARCH Modelling 

Call: 

garchFit (formula=inte~garch(1,0), data=intc, trace=F) 


Mean and Variance Equation: data ~ garch(1, 0) [data = intc] 
Conditional Distribution: norm 


Coefficient(s): 
mu omega alphal 
0.012637 0.011195 0.379492 


Std. Errors: 
based on Hessian 


Error Analysis: 
Estimate Std. Error t value Pr(s>|t]) 


mu 0.012637 0.005428 2.328 0.01990 * 
omega 0.011195 0.001239 9.034 < 2e-16 *** 
alphal 0.379492 0.115534 3.285 0.00102 ** 


Log Likelihood: 
288.0589 normalized: 0.6668031 


Standardised Residuals Tests: Model checking 
Statistic p-Value 


Jarque-Bera Test R Chi*2 137.919 0 
Shapiro-Wilk Test R wW 0.9679255 4.025172e-08 
Ljung-Box Test R Q(10) 12.54002 0.2505382 
Ljung-Box Test R Q(15) 21.33508 0.1264607 
Ljung-Box Test R Q(20) 23.19679 0.2792354 
Ljung-Box Test R*2 Q(10) 16.0159 0.09917815 
Ljung-Box Test R*2 Q(15) 36.08022 0.001721296 
Ljung-Box Test R*2 Q(20) 37.43683 0.01036728 
LM Arch Test R TR*2 26.57744 0.008884587 


Information Criterion Statistics: 
AIC BIC SIC HQIC 
-1.319717 -1.291464 -1.319813 -1.308563 


> predict (m1,5) % Obtain 1 to 5-step predictions 
meanForecast meanError standardDeviation 


1 0. 01263656 0.1278609 0.1098306 
2 001263656 0.1278609 0, 1255897 
3 0.01263656 0.1278609 0.1310751 
4 0.01263656 0.1278609 0.1330976 
5 0.01263656 0.1278609 0.1338571 
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% The next command fits a GARCH(1,1) model 
> m2=garchFit (intce~garch(1,1),data=intc,trace=F) 
> summary(m2) % output edited. 
Coefficient(s): 
mu omega alphal betal 


0.01073352 0.00095445 0.08741989 0.85118414 


Error Analysis: 
Estimate Std. Error t value Pr(>|t]) 


mu 0.0107335 0.0055289 1.941 0.05222 . 
omega 0.0009544 0.0003989 2.392 0.01674 * 
alphal 0.0874199 0.0269810 3.240 0.00120 ** 
betal 0.8511841 0.0393702 21.620 < 2e-16 *** 


Statistic p-Value 
Chi*2 165.5740 O 


Jarque-Bera Test 


R 
Shapiro-wWwilk Test R wW 0.9712087 1.626824e-07 
Ljung-Box Test R Q(10) 8.267633 0.6027128 
Ljung-Box Test R Q(15) 14.42612 0.4934871 
Ljung-Box Test R Q(20) 15.13331 0.7687297 
Ljung-Box Test R*2 Q(10) 0.9891848 0.9998363 
Ljung-Box Test R*2 Q(15) 11.36596 0.7262473 
Ljung-Box Test R*2 Q(20) 12.68143 0.8906302 
LM Arch Test R TR*2 10.70199 0.5546164 


% The next command fits an ARCH(1) model with Student-t dist. 
> m3=garchFit (inte~garch(1,0),data=intc,trace=F, 
cond.dist=’std’) 
> summary(m3) % Output shortened. 


Call: 
garchFit (formula=inte~garch(1,0), data=intc, cond.dist="std", 
trace = F) 
Mean and Variance Equation: data ~ garch(1, 0) [data = intc] 
Conditional Distribution: std % Student-t distribution 


Coefficient (s): 
mu omega alphal shape 
0.016731 0.011939 0.285320 6.015195 


Error Analysis: 

Estimate Std. Error t value Pr (>ļ|t|) 
mu 0.016731 .005302 3.155 0.001603 ** 
omega 0.011939 .001603 7.449 9.4e-14 *** 
alphal 0.285320 .110607 2.580 0.009892 ** 
shape 6.015195 .562620 3.849 0.000118 *** 


Q 


% Degrees of freedom 


HOOO 
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The next command fits an ARCH(1) model with skew 
Student-t dist. 
> m4=garchFit (intce~garch(1,0),data=intc,cond.dist='’sstd’, 
trace=F) 
Next, fit an ARMA(1,0)+GARCH(1,1) model with 
Gaussian noises. 
> m5=garchFit (intc~arma(1,0)+garch(1,1),data=intc, trace=F) 


o 
5 
o 

5 


oe 


oe 


R Demonstration 
The following output was generated with Ox and G@RCH4.2 package and % 
denotes explanation: 


> source ("garchoxfit_R.txt") 

% In G@RCH package, an ARCH(1) model is specified as 

% GARCH (0,1). 

> ml=garchOxFit (formula.mean=~arma (0,0), 
formula.var=~garch(0,1), series=intc) 

% ** SPECIFICATIONS ** 

Dependent variable : X 

Mean Equation : ARMA (0, 0) model. 

No regressor in the mean 

Variance Equation : GARCH (0, 1) model. 

No regressor in the variance 

The distribution is a Gauss distribution. 


Maximum Likelihood Estimation(Std.Errors based on 2nd deriv.) 


Coefficient Std.Error t-value t-prob 
Cst (M) 0.012630 0.0054130 2.333 0.0201 
Cst (V) 0.011129 0.0012355 9.007 0.0000 
ARCH (Alphal1) 0.387223 0.11688 34313). - 0.0010 


g ** TESTS ** 
Q-Statistics on Standardized Residuals 
Q(10)=12.4952 [0.2532785], Q(20)=23.1210 [0.2828934] 
HO: No serial correlation ==> Accept HO when prob. is High. 
Q-Statistics on Squared Standardized Residuals 
--> P-values adjusted by 1 degree(s) of freedom 
Q(10)=15.7849 [0.0715122], Q( 20)=37.0238 [0.0078807] 


ARCH 1-10 test: F(10,410)= 1.4423 [0.1592] 
% Apply Student-t distribution 
> m2=garchOxFit (formula.mean=~arma(0,0), 
formula.var=~garch(0,1), 
series=intc,cond.dist="t") 
% ** SPECIFICATIONS ** 
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Dependent variable : X 

Mean Equation : ARMA (0, 0) model. 

No regressor in the mean 

Variance Equation : GARCH (0, 1) model. 

No regressor in the variance 

The distribution is a Student distribution, with 6.02272 df. 


Maximum Likelihood Estimation(Std.Errors based on 2nd deriv.) 
Coefficient Std.Error t-value t-prob 


Cst (M) 0.016702 0.0052934 3.155 (0.0027 
Cst (V) 0.011870 0.0015969 7.433 0.0000 
ARCH (Alpha1) 0.292318 0.11223 2.605 0.0095 
Student (DF) 6.022723 1.5663 3.845 0.0001 


** TESTS ** 
Q-Statistics on Standardized Residuals 
Q(10)=13.0837 [0.2190281], Q(20)=24.0724 [0.2392436] 


Q-Statistics on Squared Standardized Residuals 
--> P-values adjusted by 1 degree(s) of freedom 
Q(10)=18.6982 [0.0278845], Q( 20)=41.7182 [0.0019343] 


Example 3.2. Consider the percentage changes of the exchange rate between 
mark and dollar in 10-minute intervals. The data are shown in Figure 3.3(a). As 
shown in Figure 3.4(a), the series has no serial correlations. However, the sample 
PACF of the squared series a? shows some big spikes, especially at lags 1 and 3. 
There are some large PACF at higher lags, but the lower order lags tend to be more 
important. Following the procedure discussed in the previous section, we specify an 
ARCH(3) model for the series. Using the conditional Gaussian likelihood function, 
we obtain the fitted model r, = 0.0018 + o;e, and 


of = 0.22 x 10°? + 0.322a?_, + 0.074a?_, + 0.093a?_,, 


where all the estimates in the volatility equation are statistically significant at the 5% 
significant level, and the standard errors of the parameters are 0.47 x 1076, 0.017, 
0.016, and 0.014, respectively. Model checking, using the standardized residual a,, 
indicates that the model is adequate. 


3.5 THE GARCH MODEL 


Although the ARCH model is simple, it often requires many parameters to ade- 
quately describe the volatility process of an asset return. For instance, consider the 
monthly excess returns of S&P 500 index of Example 3.3. An ARCH(9) model is 
needed for the volatility process. Some alternative model must be sought. Boller- 
slev (1986) proposes a useful extension known as the generalized ARCH (GARCH) 
model. For a log return series r+, let a; = r; — ur be the innovation at time t. Then 
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a, follows a GARCH(m, s) model if 


m $ 
2 2 2 
At = Oe, of = œo + J Qidi + Bjo j» (3.14) 
i=l j=l 


where again {e;} is a sequence of iid random variables with mean 0 and variance 
1.0, æo > 0, œ; > 0, B; = 0, and ye (a; + Bi) < 1. Here it is understood that 
a; =0 fori >m and f; =0 for j >s. The latter constraint on œ; + ; implies 
that the unconditional variance of a, is finite, whereas its conditional variance we 
evolves over time. As before, €; is often assumed to follow a standard normal 
or standardized Student-t distribution or generalized error distribution. Equation 
(3.14) reduces to a pure ARCH(m) model if s = 0. The œ; and f; are referred to 
as ARCH and GARCH parameters, respectively. 

To understand properties of GARCH models, it is informative to use the fol- 
lowing representation. Let n; = a? — of so that of = a? — n;. By plugging Of = 
Qe — mi (i =0,..., s) into Eq. (3.14), we can rewrite the GARCH model as 


max(m,s) sS 
a; =ao+ D> (@i+p)a i +m — D> Bin-j. (3.15) 


Ei j=l 


It is easy to check that {7;} is a martingale difference series [i.e., E(7;) = 0 and 
cov(7+, 7-7) = 0 for j > 1]. However, {7;} in general is not an iid sequence. 
Equation (3.15) is an ARMA form for the squared series a?. Thus, a GARCH 
model can be regarded as an application of the ARMA idea to the squared series 
a?. Using the unconditional mean of an ARMA model, we have 


ao 


E(a;) = m,S 
Loe aD 
provided that the denominator of the prior fraction is positive. 
The strengths and weaknesses of GARCH models can easily be seen by focusing 
on the simplest GARCH(1,1) model with 


of = œo + 01a; +i 0<, pi <l, (@+8)<1. 616) 


First, a large a 10r oki gives rise to a large o7. This means that a large ai tends 
to be followed by another large a?, generating, again, the well-known behavior 
of volatility clustering in financial time series. Second, it can be shown that if 
1 — 2a? — (a + £1)? > 0, then 


E(af) 3[1 — (a + 61°] 


[EAP 1- (œ + Ai)? — 207 
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Consequently, similar to ARCH models, the tail distribution of a GARCH(1,1) 
process is heavier than that of a normal distribution. Third, the model provides a 
simple parametric function that can be used to describe the volatility evolution. 

Forecasts of a GARCH model can be obtained using methods similar to those 
of an ARMA model. Consider the GARCH(1,1) model in Eq. (3.16) and assume 
that the forecast origin is h. For 1-step-ahead forecast, we have 


2 — 2 2 
Tha) =% + aja, + PIOR, 


where ap and of are known at the time index h. Therefore, the 1-step-ahead forecast 
is 


of (1) = a0 + ia} + Bop. 


For multistep-ahead forecasts, we use a? = ofe? and rewrite the volatility equation 


in Eq. (3.16) as 
O41 = œo + (a1 + Bio? + a0; (e? — 1). 
When t = h + 1, the equation becomes 
O42 = 0 + (a1 + Bio HAR (Chu — D- 


Since E Gs +417 1| Fn) = 0, the 2-step-ahead volatility forecast at the forecast origin 
h satisfies the equation 


oj, (2) = a + (a1 + B1)o% (1). 
In general, we have 
op (£) = æo + (a1 + Bi)of(e — 1), é>1. (3.17) 
This result is exactly the same as that of an ARMA(1,1) model with AR polynomial 


1 — (a; + f1)B. By repeated substitutions in Eq. (3.17), we obtain that the ¢-step- 
ahead forecast can be written as 


atoll — (or + Bi) "] = 
opo = SE a + Bd top. 
Therefore, 
oO > aoe as £> œ 
— aj — Pı 


provided that a; + 1 < 1. Consequently, the multistep-ahead volatility forecasts of 
a GARCH(1,1) model converge to the unconditional variance of a; as the forecast 
horizon increases to infinity provided that Var(a;) exists. 
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The literature on GARCH models is enormous; see Bollerslev, Chou, and Kroner 
(1992), Bollerslev, Engle, and Nelson (1994), and the references therein. The model 
encounters the same weaknesses as the ARCH model. For instance, it responds 
equally to positive and negative shocks. In addition, recent empirical studies of 
high-frequency financial time series indicate that the tail behavior of GARCH 
models remains too short even with standardized Student-t innovations. For further 
information about kurtosis of GARCH models, see Section 3.16. 


3.5.1 An Illustrative Example 


The modeling procedure of ARCH models can also be used to build a GARCH 
model. However, specifying the order of a GARCH model is not easy. Only 
lower order GARCH models are used in most applications, say, GARCH(1,1), 
GARCH(?2,1), and GARCH(1,2) models. The conditional maximum-likelihood 
method continues to apply provided that the starting values of the volatility {07} 
are assumed to be known. Consider, for instance, a GARCH(1,1) model. If of is 
treated as fixed, then o? can be computed recursively for a GARCH(1,1) model. 
In some applications, the sample variance of a; serves as a good starting value 
of of. The fitted model can be checked by using the standardized residual a, = 
ar/o, and its squared process. 


Example 3.3. In this example, we consider the monthly excess returns of 
S&P 500 index starting from 1926 for 792 observations. The series is shown in 
Figure 3.6. Denote the excess return series by r;. Figure 3.7 shows the sample ACF 
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Figure 3.6 Time series plot of monthly excess returns of S&P 500 index from 1926 to 1991. 
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Figure 3.7 (a) Sample ACF of monthly excess returns of S&P 500 index and (b) sample PACF of 
squared monthly excess returns. Sample period is from 1926 to 1991. 


of r, and the sample PACF of r?. The r, series has some serial correlations at lags 
1 and 3, but the key feature is that the PACF of r? shows strong linear dependence. 
If an MA(3) model is entertained, we obtain 


ri = 0.0062 + a; + 0.0944a;_1 — 0.1407a;-3, Sq = 0.0576 


for the series, where all of the coefficients are significant at the 5% level. However, 
for simplicity, we use instead an AR(3) model 


ri = biri-1 + bari—2 + b3ri-3 + Po + ar. 


The fitted AR(3) model, under the normality assumption, is 
ri = 0.088r;—1 — 0.0237;_-2 — 0.123r;_3 + 0.0066 + a;, 6? = 0.00333. (3.18) 
For the GARCH effects, we use the GARCH(1,1) model 
at = Oe, of = æo + Bio7_, + oa? i]. 


A joint estimation of the AR(3)—GARCH(1,1) model gives 
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r; = 0.0078 + 0.032r;—1 — 0.0297;-2 — 0.0087;-3 + ar, 
a = 0.000084 + 0.1213a?_, + 0.852307 ,. 


From the volatility equation, the implied unconditional variance of a; is 


0.000084 


—__ — = 0.00317, 
1 — 0.8523 — 0.1213 


which is close to that of Eq. (3.18). However, t ratios of the parameters in the 
mean equation suggest that all three AR coefficients are insignificant at the 5% 
level. Therefore, we refine the model by dropping all AR parameters. The refined 
model is 


r, = 0.0076 +a, øo = 0.000086 + 0.1216a7_,+0.8511lo7,. (3.19) 


The standard error of the constant in the mean equation is 0.0015, whereas those 
of the parameters in the volatility equation are 0.000024, 0.0197, and 0.0190, 
respectively. The unconditional variance of a; is 0.000086/(1 — 0.8511 — 0.1216) 
= 0.00314. This is a simple stationary GARCH(1,1) model. Figure 3.8 shows the 
estimated volatility process, op, and the standardized shocks a, = at/o, for the 
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Figure 3.8 (a) Time series plot of estimated volatility (o,) for monthly excess returns of S&P 500 
index and (b) standardized shocks of monthly excess returns of S&P 500 index. Both plots are based 
on GARCH(1,1) model in Eq. (3.19). 
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Figure 3.9 Model checking of GARCH(1,1) model in Eq. (3.19) for monthly excess returns of S&P 
500 index: (a) Sample ACF of standardized residuals and (b) sample ACF of the squared standardized 
residuals. 


GARCH(1,1) model in Eq. (3.19). The a, series looks like a white noise pro- 
cess. Figure 3.9 provides the sample ACF of the standardized residuals a; and the 
squared process G?. These ACFs fail to suggest any significant serial correlations 
or conditional heteroscedasticity in the standardized residual series. More specifi- 
cally, we have Q(12) = 11.99(0.45) and Q(24) = 28.52(0.24) for a;, and Q(12) 
= 13.11(0.36) and Q(24) = 26.45(0.33) for a?, where the number in parenthe- 
ses is the p value of the test statistic. Thus, the model appears to be adequate in 
describing the linear dependence in the return and volatility series. Note that the 
fitted model shows a + Bi = 0.9772, which is close to 1. This phenomenon is 
commonly observed in practice and it leads to imposing the constraint a; + 6; = 1 
in a GARCH(1,1) model, resulting in an integrated GARCH (or IGARCH) model; 
see Section 3.6. 


Finally, to forecast the volatility of monthly excess returns of the S&P 500 index, 
we can use the volatility equation in Eq. (3.19). For instance, at the forecast origin 
h, we have On = 0.000086 + 0.1216a? + 0.85110%. The 1-step-ahead forecast is 
then 


až (1) = 0.000086 + 0.121647, + 0.851167, 
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TABLE 3.1 Volatility Forecasts for Monthly Excess Returns of S&P 500 Index“ 


Horizon 1 2 3 4 5 love) 
Return 0.0076 0.0076 0.0076 0.0076 0.0076 0.0076 
Volatility 0.0536 0.0537 0.0537 0.0538 0.0538 0.0560 


“The forecast origin is h = 792, which corresponds to December 1991. Here volatility denotes condi- 
tional standard deviation. 


where ap is the residual of the mean equation at time h and op is obtained from the 
volatility equation. The starting value og is fixed at either zero or the unconditional 
variance of a;. For multistep-ahead forecasts, we use the recursive formula in Eq. 
(3.17). Table 3.1 shows some mean and volatility forecasts for the monthly excess 
return of the S&P 500 index with forecast origin h = 792 based on the GARCH(1,1) 
model in Eq. (3.19). 


Some S-Plus Commands Used in Example 3.3. 


> fit=garch(sp~ar(3),~garch(1,1)) 
> summary (fit) 
> fit=garch(sp~1,~garch(1,1)) 
> summary (fit) 
> names (fit) 
[1] "residuals" "sigma.t" "df.residual" "coef" "model" 
[6] "cond.dist" "likelihood" "opt.index" "covy" 
"prediction" 
11] “eall! "asymp.sd" "series" 


[ 

= 

> stdresi=fitSresiduals/fitS$sigma.t 
> autocorTest (stdresi, lag=24) 
> 
> 


autocorTest (stdresi*2,lag=24) 
predict (fit,5) 


Note that in the prior commands the volatility series o; is stored in fit$sigma.t 
and the residual series of the returns in fitSresiduals. 


t Innovation 
Assuming that €e; follows a standardized Student-r distribution with 5 degrees of 
freedom, we reestimate the GARCH(1,1) model and obtain 


r, = 0.0085 +a, až = 0.00012 +0.1121a?_, + 0.843207 ,, (3.20) 


where the standard errors of the parameters are 0.0015, 0.51 1074, 0.0296, and 
0.0371, respectively. This model is essentially an IGARCH(1,1) model as @ + B 1 
~ 0.95, which is close to 1. The Ljung—Box statistics of the standardized residuals 
give Q(10) = 11.38 with a p value of 0.33 and those of the {a7} series give Q(10) 
= 10.48 with a p value of 0.40. Thus, the fitted GARCH(1,1) model with Student-t 
distribution is adequate. 


THE GARCH MODEL 139 


S-Plus Commands Used 


fitl = garch(sp~1,~garch(1,1),cond.dist='’t’,cond.par=5, 
cond.est=F) 

summary (f£it1) 

stresi=fitl$residuals/fiti$sigma.t 

autocorTest (stresi, lag=10) 

autocorTest (stresi*2,lag=10) 


VVVV +V 


Estimation of Degrees of Freedom 
If we further extend the GARCH(1,1) model by estimating the degrees of freedom 
of the Student-t distribution used, we obtain the model 


r, = 0.0085 +a, of = 0.00012 +0.1121a?_, + 0.843207 ,, (3.21) 


where the estimated degrees of freedom is 7.02. Standard errors of the estimates 
in Eq. (3.21) are close to those in Eq. (3.20). The standard error of the estimated 
degrees of freedom is 1.78. Consequently, we cannot reject the hypothesis of using a 
standardized Student-t distribution with 5 degrees of freedom at the 5% significance 
level. 


S-Plus Commands Used 


Vv 


fit2 = garch(sp~1,~garch(1,1),cond.dist='t’) 
summary (fit2) 


Vv 


R Commands Used in Example 3.3 

> library (fGarch) 

> sp5=scan(file=’sp500.txt’) % Load data 

> plot(sp5,type='1') 

% Below, fit an AR(3)+GARCH(1,1) model. 

> ml=garchFit (~arma(3,0)+garch(1,1),data=sp5,trace=F) 

> summary (m1) 

% Below, fit a GARCH(1,1) model with Student-t distribution. 
> m2=garchFit (~garch(1,1),data=sp5,trace=F,cond.dist="std") 
> summary (m2) 

% Obtain standardized residuals. 

> stresi=residuals(m2,standardize=T) 

> plot(stresi,type='1’) 

> Box.test(stresi,10,type=’Ljung’ ) 

> predict (m2,5) 


3.5.2 Forecasting Evaluation 


Since the volatility of an asset return is not directly observable, comparing the 
forecasting performance of different volatility models is a challenge to data analysts. 
In the literature, some researchers use out-of-sample forecasts and compare the 
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volatility forecasts oj (£) with the shock a? +e in the forecasting sample to assess 
the forecasting performance of a volatility model. This approach often finds a 
low correlation coefficient between až +e and of (£), that is, low R?. However, 
such a finding is not surprising because a? 4¢ alone is not an adequate measure 
of the volatility at time index h + £. Consider the 1-step-ahead forecasts. From a 
statistical point of view, E (a? +1 Fp) = oO; 41 SO that a? 41 is a consistent estimate 
of of 1- But it is not an accurate estimate of a, +1 because a single observation of a 
random variable with a known mean value cannot provide an accurate estimate of 
its variance. Consequently, such an approach to evaluate forecasting performance of 
volatility models is strictly speaking not proper. For more information concerning 
forecasting evaluation of GARCH models, readers are referred to Andersen and 
Bollerslev (1998). 


3.5.3 A Two-Pass Estimation Method 


Based on Eq. (3.15), a two-pass estimation method can be used to estimate GARCH 
models. First, ignoring any ARCH effects, one estimates the mean equation of a 
return series using the methods discussed in Chapter 2 (e.g., maximum-likelihood 
method). Denote the residual series by a,. Second, treating {a?} as an observed 
time series, one applies the maximum-likelihood method to estimate parameters 
of Eq. (3.15). Denote the AR and MA coefficient estimates by ĝi and 6;. The 
GARCH estimates are obtained as Bi = 6; and @ = ĝi — 6... Obviously, such esti- 
mates are approximations to the true parameters and their statistical properties have 
not been rigorously investigated. However, limited experience shows that this sim- 
ple approach often provides good approximations, especially when the sample size 
is moderate or large. For instance, consider the monthly excess return series of the 
S&P 500 index of Example 3.3. Using the conditional MLE method in SCA, we 
obtain the model 


r, =0.0061+a4;, a? = 0.00014 + 0.9583a7_, + n — 0.8456n;-1, 


where all estimates are significantly different from zero at the 5% level. From 
the estimates, we have Bi = 0.8456 and a = 0.9583 — 0.8456 = 0.1127. These 
approximate estimates are very close to those in Eq. (3.19) or (3.21). Further- 
more, the fitted volatility series of the two-pass method is very close to that of 
Figure 3.8(a). 


3.6 THE INTEGRATED GARCH MODEL 


If the AR polynomial of the GARCH representation in Eq. (3.15) has a unit root, 
then we have an IGARCH model. Thus, IGARCH models are unit-root GARCH 
models. Similar to ARIMA models, a key feature of IGARCH models is that the 


impact of past squared shocks 7;-; = Ge — oP; for i > 0 on a? is persistent. 
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An IGARCH(1,1) model can be written as 
== 2. 2 2 
at = 04€, (ops = œo + iok + (1 — Bi)ajs 


where {e,} is defined as before and 1 > 6; >0. For the monthly excess returns of 
the S&P 500 index, an estimated IGARCH(1,1) model is 


Ft = 0.0067 + ar, ay = Ot Et, 
a = 0.000119 + 0.805907, +0.1941a?_,, 


where the standard errors of the estimates in the volatility equation are 0.0017, 
0.000013, and 0.0144, respectively. The parameter estimates are close to those of 
the GARCH(1,1) model shown before, but there is a major difference between 
the two models. The unconditional variance of a;, hence that of 7;, is not defined 
under the above IGARCH(1,1) model. This seems hard to justify for an excess 
return series. From a theoretical point of view, the IGARCH phenomenon might 
be caused by occasional level shifts in volatility. The actual cause of persistence 
in volatility deserves a careful investigation. 
When a + fı = 1, repeated substitutions in Eq. (3.17) give 


on (2) =o,()+(—Dao, L21, (3.22) 


where A is the forecast origin. Consequently, the effect of o7 (1) on future volatilities 
is also persistent, and the volatility forecasts form a straight line with slope ao. 
Nelson (1990) studies some probability properties of the volatility process o? under 
an IGARCH model. The process o? is a martingale for which some nice results 
are available in the literature. Under certain conditions, the volatility process is 
strictly stationary but not weakly stationary because it does not have the first two 
moments. 

The case of ap = 0 is of particular interest in studying the IGARCH(1,1) model. 
In this case, the volatility forecasts are simply o7 (1) for all forecast horizons; 
see Eq. (3.22). This special IGARCH(1,1) model is the volatility model used in 
RiskMetrics, which is an approach for calculating value at risk; see Chapter 7. 
The model is also an exponential smoothing model for the {a?} series. To see this, 
rewrite the model as 


of = (1 — pida? + Biop 
= (1 — Bi)a?_, + Bil — 8)a?_, + Bie?) 
= (1 — fia? + (1 — Bi) fia?» + boka. 


By repeated substitutions, we have 


o7 = (1 — Bi)(a?_, + Biaz_y + Bra? +++»), 
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which is the well-known exponential smoothing formation with 6; being the dis- 
counting factor. Exponential smoothing methods can thus be used to estimate such 
an IGARCH(1,1) model. 


3.7 THE GARCH-M MODEL 


In finance, the return of a security may depend on its volatility. To model such 
a phenomenon, one may consider the GARCH-M model, where M stands for 
GARCH in the mean. A simple GARCH(1,1)-M model can be written as 


2 
ry = p + cof + a, at = Ofer, 


or = &o + aya? + bioz x, (3.23) 


where u and c are constants. The parameter c is called the risk premium parameter. 
A positive c indicates that the return is positively related to its volatility. Other 
specifications of risk premium have also been used in the literature, including 
re = U+co,+aq, andr, = w+ec In(o7) + a. 

The formulation of the GARCH-M model in Eq. (3.23) implies that there are 
serial correlations in the return series r;. These serial correlations are introduced 
by those in the volatility process {o7}. The existence of risk premium is, therefore, 
another reason that some historical stock returns have serial correlations. 

For illustration, we consider a GARCH(1,1)-M model with Gaussian innova- 
tions for the monthly excess returns of the S&P 500 index from January 1926 to 
December 1991. The fitted model is 


r, = 0.00554 1.0907 +4, of = 8.76 x 1075 + 0.123a?_, + 0.84907 ,, 


where the standard errors for the two parameters in the mean equation are 0.0023 
and 0.818, respectively, and those for the parameters in the volatility equation are 
2.51x~>, 0.0205, and 0.0196, respectively. The estimated risk premium for the 
index return is positive but is not statistically significant at the 5% level. Here the 
result is obtained using S-Plus. Other forms of GARCH-M specification in S-Plus 
are given in Table 3.2. The idea of risk premium applies to other GARCH models. 


TABLE 3.2 GARCH-M Models Allowed in S-Plus: 
Mean Equation Is r; = p + cg (01) + a 


(or) Command 
o? var.in.mean 
on sd.in.mean 


In(o7) logvar.in.mean 


THE EXPONENTIAL GARCH MODEL 143 
S-Plus Demonstration 


> sp.fit = garch(sp~lt+var.in.mean,~garch(1,1)) 
> summary (sp. fit) 


3.8 THE EXPONENTIAL GARCH MODEL 


To overcome some weaknesses of the GARCH model in handling financial time 
series, Nelson (1991) proposes the exponential GARCH (EGARCH) model. In 
particular, to allow for asymmetric effects between positive and negative asset 
returns, he considered the weighted innovation 


g(ér) = Oe; + yller| — E(le)], (3.24) 


where @ and y are real constants. Both e; and |e;| — E(|é;|) are zero-mean iid 
sequences with continuous distributions. Therefore, E[g(€,)] = 0. The asymmetry 
of g(€;) can easily be seen by rewriting it as 


elena | OFMa-rElel) fe 20, 
US| @-ne-vElel) ife <0. 


Remark. For the standard Gaussian random variable ¢,, E(\é,|) = /2/. For 
the standardized Student-r distribution in Eq. (3.7), we have 


2/v — 2D [(v + 1)/2] 


Ele) = —preladz 


An EGARCH(m, s) model can be written as 


1+ A,B +--+ b185! 
dy = 06, In(o,) = a + =~ e-1),, 825) 
— àl — rr = Om 


where a is a constant, B is the back-shift (or lag) operator such that Bg(e;) = 
g(é,_;), and 1+6,;B+---+ B,-,B°~! and 1—a,B—---—a,B” are polyno- 
mials with zeros outside the unit circle and have no common factors. By outside 
the unit circle we mean that absolute values of the zeros are greater than 1. Again, 
Eq. (3.25) uses the usual ARMA parameterization to describe the evolution of the 
conditional variance of a;. Based on this representation, some properties of the 
EGARCH model can be obtained in a similar manner as those of the GARCH 
model. For instance, the unconditional mean of In(o7) is œo. However, the model 
differs from the GARCH model in several ways. First, it uses logged conditional 
variance to relax the positiveness constraint of model coefficients. Second, the use 
of g(€,) enables the model to respond asymmetrically to positive and negative 
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lagged values of a. Some additional properties of the EGARCH model can be 
found in Nelson (1991). 

To better understand the EGARCH model, let us consider the simple model with 
order (1,1): 


at = Ort, (1—a@B) In(o?) = (1 — æ)æo + g (€11), (3.26) 


where the e, are iid standard normal and the subscript of a; is omitted. In this 
case, E(\é,|) = /2/m and the model for In(o7) becomes 


a, + (Y + OE] ife,1 = 0, 


: 3.27 
ds + (y —O)(—E-1) if 1 <0, G2) 


(1 — @B) In(o?) = | 


where œ, = (1 — a)ag — /2/sy. This is a nonlinear function similar to that of the 
threshold autoregressive (TAR) model of Tong (1978, 1990). It suffices to say that 
for this simple EGARCH model the conditional variance evolves in a nonlinear 
manner depending on the sign of ap—1. Specifically, we have 


2 2 
Of = 0/2 exp(ax) 


The coefficients (y +0) and (y — 0) show the asymmetry in response to posi- 
tive and negative a,_;. The model is, therefore, nonlinear if 6 4 0. Since nega- 
tive shocks tend to have larger impacts, we expect 0 to be negative. For higher 
order EGARCH models, the nonlinearity becomes much more complicated. Cao 
and Tsay (1992) use nonlinear models, including EGARCH models, to obtain 
multistep-ahead volatility forecasts. We discuss nonlinearity in financial time series 
in Chapter 4. 


3.8.1 Alternative Model Form 
An alternative form for the EGARCH(m, s) model is 


S m 
In(o?) =a) +) jg Ce YB; In(o?_;). (3.28) 
ii Sizi j=l 
Here a positive a;_; contributes a;(1 + y;)|€;—;| to the log volatility, whereas a 
negative a;_; gives a;(1 — y;)|€;-;|, where €;_; = a;_;/o;—-;. The y; parameter thus 
signifies the leverage effect of a;_;. Again, we expect y; to be negative in real 
applications. This is the model form used in S-Plus. 
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3.8.2 Illustrative Example 


Nelson (1991) applies an EGARCH model to the daily excess returns of the value- 
weighted market index from the Center for Research in Security Prices from July 
1962 to December 1987. The excess returns are obtained by removing monthly 
Treasury bill returns from the value-weighted index returns, assuming that the 
Treasury bill return was constant for each calendar day within a given month. 
There are 6408 observations. Denote the excess return by r;. The model used is as 
follows: 


ri = bo + Giri + co? + ar, (3.29) 
1+ 6B 


In(o7) = æo + In(1 + wN;) + ae 
— al — a2 


B2 g(€r-1), 

where o? is the conditional variance of a, given F;—1, N; is the number of nontrad- 
ing days between trading days t — 1 and f, ap and w are real parameters, g(€;) is 
defined in Eq. (3.24), and e; follows a generalized error distribution in Eq. (3.10). 
Similar to a GARCH-M model, the parameter c in Eq. (3.29) is the risk premium 
parameter. Table 3.3 gives the parameter estimates and their standard errors of the 
model. The mean equation of model (3.29) has two features that are of interest. 
First, it uses an AR(1) model to take care of possible serial correlation in the excess 


returns. Second, it uses the volatility o? as a regressor to account for risk premium. 
The estimated risk premium is negative, but statistically insignificant. 


3.8.3 Second Example 


As another illustration, we consider the monthly log returns of IBM stock from 
January 1926 to December 1997 for 864 observations. An AR(1)-EGARCH(1,1) 
model is entertained and the fitted model is 


r, = 0.0105 + 0.092r,_; + a, a, = 0464, (3.30) 
g(€r-1) 
info?) = 9.4064 2 
M= =E I TTT 
g(€;-1) = —0.0795¢,_1 + 0.2647 (lel = v2/n) ; (3.31) 


TABLE 3.3 Estimated AR(1)-EGARCH(2,2) Model for Daily Excess Returns of 
Value-Weighted CRSP Market Index: July 1962—December 1987 


Parameter ao w y OA æ B 
Estimate —10.06 0.183 0.156 1,929 —0.929 —0.978 
Error 0.346 0.028 0.013 0.015 0.015 0.006 
Parameter 0 Qo Qı c v 

Estimate —0.118 3.5-10-4 0.205 —3.361 1.576 

Error 0.009 9.9.1075 0.012 2.026 0.032 
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where {e;} is a sequence of independent standard Gaussian random variates. All 
parameter estimates are statistically significant at the 5% level. For model check- 
ing, the Ljung—Box statistics give Q(10) = 6.31(0.71) and Q(20) = 21.4(0.32) 
for the standardized residual process a, = a;/o, and Q(10) = 4.13(0.90) and 
Q(20) = 15.93(0.66) for the squared process a, where again the number in 
parentheses denotes p value. Therefore, there is no serial correlation or condi- 
tional heteroscedasticity in the standardized residuals of the fitted model. The prior 
AR(1)—EGARCH(1,1) model is adequate. 

From the estimated volatility equation in (3.31) and using ./2/z ~ 0.7979, we 
obtain the volatility equation as 


2 0.1852e€,— if e1 > 0 

A 2 t-1 i129, 

I, = ROOT 0.830 info?) + | —0.3442e,1 if e1 <0. 
Taking antilog transformation, we have 

7 e0- 185261-1 if € 412 0, 

o2 = o2%0856¢ 1.001 | g—0.3442 1 Gf _ 2 0, 


This equation highlights the asymmetric responses in volatility to the past positive 
and negative shocks under an EGARCH model. For example, for a standardized 
shock with magnitude 2 (i.e., two standard deviations), we have 


or (en = —2) _ exp[—0.3442 x (—2)] — 20318 L 1.374. 
of (€;-1 = 2) exp(0.1852 x 2) 
Therefore, the impact of a negative shock of size 2 standard deviations is about 
37.4% higher than that of a positive shock of the same size. This example clearly 
demonstrates the asymmetric feature of EGARCH models. In general, the bigger 
the shock, the larger the difference in volatility impact. 
Finally, we extend the sample period to include the log returns from 1998 to 
2003 so that there are 936 observations and use S-Plus to fit an EGARCH(1,1) 
model. The results are given below. 


S-Plus Demonstration 
The following output has been edited: 


> ibm.egarch=garch(ibmln~1,~egarch(1,1),leverage=T, 
+ cond.dist=’ged’) 


> summary (ibm. egarch) 

Call: 

garch(formula.mean = ibmln ~ 1, formula.var = ~ egarch(1, 1), 
leverage = T,cond.dist = "ged") 


Mean Equation: ibmln ~ 1 
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Conditional Variance Equation: ~ egarch(1, 1) 
Conditional Distribution: ged 
with estimated parameter 1.5003 and standard error 0.09912 


Value Std.Error t value Pr(>|t]) 


C 0.01181 0.002012 5.870 3.033e-09 

A -0.55680 0.171602 -3.245 6.088e-04 
ARCH(1) 0.22025 0.052824 4.169 1.669e-05 
GARCH(1) 0.92910 0.026743 34.742 0.000e+00 
LEV(1) -0.26400 0.126096 -2.094 1.828e-02 


Statistic P-value Chi*2-d.f£. 
LY 87 O11195 12 


Ljung-Box test for squared standardized residuals: 


Statistic P-value Chi*2-d.f£. 
6.723 0.8754 12 


The fitted GARCH(1,1) model is 


ri = 0.0118 + a, at = Otét, 
_1|— 0.264a,_ 
In(o2) = —0.557 + 0.220 411 = 0264411 |g g99In(o2,), 3.32) 


Or-1 


where €, follows a GED distribution with parameter 1.5. This model is adequate 
and based on the Ljung—Box statistics of the standardized residual series and its 
squared process. As expected, the output shows that the estimated leverage effect 
is negative and is statistically significant at the 5% level with a ¢ ratio of —2.094. 


3.8.4 Forecasting Using an EGARCH Model 


We use the EGARCH(1,1) model to illustrate multistep-ahead forecasts of 
EGARCH models, assuming that the model parameters are known and the 
innovations are standard Gaussian. For such a model, we have 


In(o7) = (1 — œ1)æo + o1 In(o2 1) + g(Er-1), 
B(&-1) = 0&1 + (ler_-1| — y 2/7). 


Taking exponentials, the model becomes 


2 


o? = o“! exp[(1 — ay)aol explg(€1)]. 


8(Er-1) = Oer-1 + y (leil — y 2/7). (3.33) 
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Let h be the forecast origin. For the 1-step-ahead forecast, we have 

Op.) = 0," exp[(1 — a1 )ao] explg(en)], 
where all of the quantities on the right-hand side are known. Thus, the 1-step-ahead 


volatility forecast at the forecast origin h is simply 6? (1) = oO; 41 given earlier. For 
the 2-step-ahead forecast, Eq. (3.33) gives 


2 2 
O42 = Opi EXPL — a1 )ao] explg (En 41)].- 
Taking conditional expectation at time h, we have 


ôk (2) = ô; (1) exp[(1 — ors oro En {expe (€n+1)]}, 


where Ep denotes a conditional expectation taken at the time origin h. The prior 
expectation can be obtained as follows: 


=exp(—y/2 > Ore l me 2g 
= exp ( —y y 2/7 e ae € 
0 


27 
0 1 2 
+ J ges e7 ae| 
—o0 ~ 27 
= exp (-y 277) [o +y) teO-Y Lay — 6) | 


Elexple(©)} = J plie + y(lel — yrd 


where f(e) and ®(x) are the probability density function and CDF of the stan- 
dard normal distribution, respectively. Consequently, the 2-step-ahead volatility 
forecast is 


62(2) = 62" (1) exp [a -aay 277 | 


x fexpl(@ + y)°/2]®@ + y) + expl(@ — y)? /2® (y — 0)}. 


Repeating the previous procedure, we obtain a recursive formula for a j-step-ahead 
forecast: 


aj) = oF” G — 1) explo) 
x {expl(@ + y)*/2]®( + y) + expl(@ — y)’ /2J9 (y — 0)}, 


where w = (1 — œı)æo — y»/2/z. The values of P (0 + y) and ®(y — 0) can be 
obtained from most statistical packages. Alternatively, accurate approximations to 
these values can be obtained by using the method in Appendix B of Chapter 6. 
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For illustration, consider the AR(1)-EGARCH(1,1) model of the previous 
section for the monthly log returns of IBM stock, ending December 1997. Using 
the fitted EGARCH(1,1) model, we can compute the volatility forecasts for the 
series. At the forecast origin £ = 864, the forecasts are õa) = 6.05 x 1073, 
ĉl) = 5.82 x 1073, ôg (3) = 5.63 x 1073, and 64 (10) = 4.94 x 1073. 
These forecasts converge gradually to the sample variance 4.37 x 107° of the 
shock process a, of Eq. (3.30). 


3.9 THE THRESHOLD GARCH MODEL 


Another volatility model commonly used to handle leverage effects is the threshold 
GARCH (or TGARCH) model; see Glosten, Jagannathan, and Runkle (1993) and 
Zakoian (1994). A TGARCH(m, s) model assumes the form 


of = ao + X ai + yi Nia + >> Bio j, (3.34) 
i=l j=l 


where N;—; is an indicator for negative a;_;, that is, 


1 ifai <0, 
N=] O ifani > 0, 


and a, yi, and f; are nonnegative parameters satisfying conditions similar to those 
of GARCH models. From the model, it is seen that a positive a;_; contributes aja? ; 
to of, whereas a negative a;_; has a larger impact (œ; + yDaZ i with y; > 0. The 
model uses zero as its threshold to separate the impacts of past shocks. Other 
threshold values can also be used; see Chapter 4 for the general concept of threshold 
models. Model (3.34) is also called the GJR model because Glosten et al. (1993) 
proposed essentially the same model. 

For illustration, consider the monthly log returns of IBM stock from 1926 to 
2003. The fitted TGARCH(1,1) model with conditional GED innovations is 


Ft = 0.0121 + 4, ay = 0Ot6t, 
af = 3.45 x 107+ + (0.0658 + 0.0843N;_1)a7_, +. 0.818207, (3.35) 


where the estimated parameter of the GED is 1.51 with standard error 0.099. The 
standard error of the parameter for the mean equation is 0.002 and the standard 
errors of the parameters in the volatility equation are 1.26x —4 0.0314, 0.0395, and 
0.049, respectively. To check the fitted model, we have Q(12) = 18.34(0.106) for 
the standardized residual a, and Q(12) = 5.36 (0.95) for a. The model is adequate 
in modeling the first two conditional moments of the log return series. Based on 
the fitted model, the leverage effect is significant at the 5% level. 
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S-Plus Commands Used 


> ibm.tgarch = garch(ibmln~1,~tgarch(1,1),leverage=T, 
+ cond.dist=’ged’ ) 

> summary (ibm. tgarch) 

> plot (ibm.tgarch) 


It is interesting to compare the two models in Eqs. (3.32) and (3.35) for the 
monthly log returns of IBM stock. Assume that a;_; = +20;_, so that €;_; = +2. 
The EGARCH(1,1) model gives 


AGE =~?) = 90-22x2x0.632 ~ 1 264. 
Or (€;-1 = 2) 

On the other hand, ignoring the constant term 0.000345, the TGARCH(1,1) model 

gives 


o? (6-1 = —2) _ [0.0658 + 0.0843)4 + 0.8182]o? | 


Mit So s = 1.312. 
oF (e-1 = 2) (0.0658 x 4 + 0.8182)o2_, 


The two models provide similar leverage effects. 


3.10 THE CHARMA MODEL 


Many other econometric models have been proposed in the literature to describe the 
evolution of the conditional variance a; in Eq. (3.2). We mention the conditional 
heteroscedastic ARMA (CHARMA) model that uses random coefficients to produce 
conditional heteroscedasticity; see Tsay (1987). The CHARMA model is not the 
same as the ARCH model, but the two models have similar second-order conditional 
properties. A CHARMA model is defined as 


ry = Ut + a, at = Ô1tat—1 + b2¢Q;-2 + +++ + Omtdi-m + ne, (3.36) 


where {ņ;} is a Gaussian white noise series with mean zero and variance a. 
{5;} = {(Ou+,.--, Ôm) } is a sequence of iid random vectors with mean zero and 
nonnegative definite covariance matrix Q, and {6,} is independent of {n+}. In this 
section, we use some basic properties of vector and matrix operations to simplify 
the presentation. Readers may consult Appendix A of Chapter 8 for a brief review 


of these properties. For m > 0, the model can be written as 
a= a’_\6; + 11, 


where a;_, = (a;-1,..-,;—-m)’ is a vector of lagged values of a; and is available 
at time ¢t — 1. The conditional variance of a; of the CHARMA model in Eq. (3.36) 
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is then 


2 2 1 
o; = 0; + 4,_,Cov(d;)a;—1 


= o; + (p15 ++ Gm) Q2(Ay—-1, - -< p Gm)’. (3.37) 


Denote the (i, j)th element of & by w;;. Because the matrix is symmetric, we have 
wij = wji. If m = 1, then Eq. (3.37) reduces to o? = a, + Wide 45 which is an 
ARCH(1) model. If m = 2, then Eq. (3.37) reduces to 


a = o; + od; + 2@124;—1a;~-2 + and; 5, 

which differs from an ARCH(2) model by the cross-product term a;—ja;—2. In 
general, the conditional variance of a CHARMA(m) model is equivalent to that 
of an ARCH(m) model if @ is a diagonal matrix. Because @ is a covariance 
matrix, which is nonnegative definite, and o; is a variance, which is positive, we 
have o? > o, > 0 for all ż. In other words, the positiveness of of is automatically 
satisfied under a CHARMA model. 

An obvious difference between ARCH and CHARMA models is that the latter 
use cross products of the lagged values of a, in the volatility equation. The cross- 
product terms might be useful in some applications. For example, in modeling 
an asset return series, cross-product terms denote interactions between previous 
returns. It is conceivable that stock volatility may depend on such interactions. 
However, the number of cross-product terms increases rapidly with the order m, 
and some constraints are needed to keep the model simple. A possible constraint 
is to use a small number of cross-product terms in a CHARMA model. Another 
difference between the two models is that higher order properties of CHARMA 
models are harder to obtain than those of ARCH models because it is in general 
harder to handle multiple random variables. 

For illustration, we employ the CHARMA model 


r: = bo + ar, at = 84¢4¢~-1 + b24;-2 + Mt 


for the monthly excess returns of the S&P 500 index used before in GARCH 
modeling. The fitted model is 


r, = 0.00635 +a, o2 = 0.00179 + (a1, ar-2)®@ (a11, a2)’, 


where 


o- 0.1417(0.0333) —0.0594(0.0365) 
~ | —0.0594(0.0365) — 0.3081(0.0340) |’ 


where the numbers in parentheses are standard errors. The cross-product term of 
Q has at ratio of — 1.63, which is marginally significant at the 10% level. If we 
refine the model to 


rr = Qo + ar, at = ĝ1tat—1 + 52¢Q;—2 + ô3t4t—-3 + Nt, 
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but assume that 43, is uncorrelated with (817, 52;), then we obtain the fitted model 


r, = 0.0068 +a, 02 = 0.00136 + (ar-1, ar-2, a1-3)@(ar-1, a2, ar-3)', 


where the elements of @ and their standard errors, shown in parentheses, are 


7 0.1212(0.0355) —0.0622(0.0283) 0 
Q = | —0.0622(0.0283) 0.1913(0.0254) 0 
0 0 0.2988(0.0420) 


All of the estimates are now statistically significant at the 5% level. From the model, 
a; =r; — 0.0068 is the deviation of the monthly excess return from its average. 
The fitted CHARMA model shows that there is some interaction effect between 
the first two lagged deviations. Indeed, the volatility equation can be written 
approximately as 


of = 0.00136 + 0.12a?_, — 0.12a;_14;-2 + 0.19a?_, + 0.30a?_,. 


The conditional variance is slightly larger when a;—1a;—2 is negative. 


3.10.1 Effects of Explanatory Variables 


The CHARMA model can easily be generalized so that the volatility of r; may 
depend on some explanatory variables. Let {x;;}", be m explanatory variables 
available at time t. Consider the model 


m 


rt = Ut + Gt, at = XO bixi +, (3.38) 
i=1 


where 6; = (64;,.--, 51)’ and n, are random vector and variable defined in Eq. 
(3.36). Then the conditional variance of a; is 


2 2 
of =0, + 1-1, soo 5 Fmt) Oats. sey Xm, t1) 


In application, the explanatory variables may include some lagged values of az. 


3.11 RANDOM COEFFICIENT AUTOREGRESSIVE MODELS 


In the literature, the random coefficient autoregressive (RCA) model is introduced 
to account for variability among different subjects under study, similar to the panel 
data analysis in econometrics and the hierarchical model in statistics. We classify 
the RCA model as a conditional heteroscedastic model, but historically it is used 
to obtain a better description of the conditional mean equation of the process by 
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allowing for the parameters to evolve over time. A time series 7; is said to follow 
an RCA(p) model if it satisfies 


p 
ri = po +Y (Qi + bir + ar, (3.39) 


i=l 


where p is a positive integer, {8;} = {(ô1r, . . - , dpr)’} is a sequence of independent 
random vectors with mean zero and covariance matrix Q5, and {6,} is independent 
of {ar}, see Nicholls and Quinn (1982) for further discussions of the model. The 
conditional mean and variance of the RCA model in Eq. (3.39) are 


P 
li = E(r (Fi) = bo +) Giri, 
i=1 


2 2 
Of =0; + (Hi, serip 2x1, ---, Hen) s 


which is in the same form as that of aCHARMA model. However, there is a subtle 
difference between RCA and CHARMA models. For the RCA model, the volatility 
is a quadratic function of the observed lagged values r;_;. Yet the volatility is a 
quadratic function of the lagged innovations a;_; in a CHARMA model. 


3.12 STOCHASTIC VOLATILITY MODEL 


An alternative approach to describe the volatility evolution of a financial time 
series is to introduce an innovation to the conditional variance equation of a;; see 
Melino and Turnbull (1990), Taylor (1994), Harvey, Ruiz, and Shephard (1994), and 
Jacquier, Polson, and Rossi (1994). The resulting model is referred to as a stochastic 
volatility (SV) model. Similar to EGARCH models, to ensure positiveness of the 
conditional variance, SV models use In(o7) instead of «7. A SV model is defined as 


A; = O76, (1 —aB—---—amB™)In(a7) = ap + vs, (3.40) 


where the €, are iid N (0, 1), the v, are iid N(O, G7), {e,} and {v;} are independent, 
a is a constant, and all zeros of the polynomial 1 — $; a; B' are greater than 
1 in modulus. Adding the innovation v; substantially increases the flexibility of 
the model in describing the evolution of oĉ, but it also increases the difficulty 
in parameter estimation. To estimate an SV model, we need a quasi-likelihood 
method via Kalman filtering or a Monte Carlo method. Jacquier, Polson, and Rossi 
(1994) provide some comparison of estimation results between quasi-likelihood and 
Markov chain Monte Carlo (MCMC) methods. The difficulty in estimating an SV 
model is understandable because for each shock a; the model uses two innovations 
€, and v;. We discuss an MCMC method to estimate SV models in Chapter 12. 
For more discussions on stochastic volatility models, see Taylor (1994). 
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The appendixes of Jacquier, Polson, and Rossi (1994) provide some properties 
of the SV model when m = 1. For instance, with m = 1, we have 


2 


ao Oy 2 
kd E, i = N ’ o ’ 
I ző i 2) (Un n) 


In(o7) ~ N ( 


and E(a?) = exp(Up + a; /2), E(a}) =3 exp(2u? + 207), and corr(a?, a) = 
[exp(a7a') — 1]/[8 exp(a/) — 1]. Limited experience shows that SV models often 
provided improvements in model fitting, but their contributions to out-of-sample 
volatility forecasts received mixed results. 


3.13 LONG-MEMORY STOCHASTIC VOLATILITY MODEL 


More recently, the SV model is further extended to allow for long memory in 
volatility, using the idea of fractional difference. As stated in Chapter 2, a time 
series is a long-memory process if its autocorrelation function decays at a hyper- 
bolic, instead of an exponential, rate as the lag increases. The extension to long- 
memory models in volatility study is motivated by the fact that the autocorrelation 
function of the squared or absolute-valued series of an asset return often decays 
slowly, even though the return series has no serial correlation; see Ding, Granger, 
and Engle (1993). Figure 3.10 shows the sample ACF of the daily absolute returns 
for IBM stock and the S&P 500 index from July 3, 1962, to December 31, 2003. 
These sample ACFs are positive with moderate magnitude but decay slowly. 
A simple long-memory stochastic volatility (LMSV) model can be written as 


lt = 0161, o, = o exp(u;/2), (1 — B)“u,; = m, (3.41) 


where ø > 0, the €, are iid N(0, 1), the 7; are iid N(O, o) and independent of €;, 
and 0 < d < 0.5. The feature of long memory stems from the fractional difference 
(1 — B)“, which implies that the ACF of u, decays slowly at a hyperbolic, instead 
of an exponential, rate as the lag increases. For model (3.41), we have 


In(a?) = In(o”) + us + In(e?) 
= [In(o*) + E(dne?)] + u; + [In(e?) — E (In eô)] 
Se_tuy +e. 


Thus, the In(a7) series is a Gaussian long-memory signal plus a non-Gaussian white 
noise; see Breidt, Crato, and de Lima (1998). Estimation of the LMSV model is 
complicated, but the fractional difference parameter d can be estimated by using 
either a quasi-maximum-likelihood method or a regression method. Using the log 
series of squared daily returns for companies in the S&P 500 index, Bollerslev and 
Jubinski (1999) and Ray and Tsay (2000) found that the median estimate of d is 
about 0.38. For applications, Ray and Tsay (2000) studied common long-memory 


APPLICATION 155 


t+ 
S 
[sP] 
5 
uo 
O oO 
r T 
O 
5 
' 0 50 100 150 200 
Lag 
(a) 
+ 
[=] 
© 
oO 
u 
O (=) 
oT 
oO 
5 
o 50 100 150 200 
Lag 
(b) 


Figure 3.10 Sample ACF of daily absolute log returns for (a) S&P 500 index and (b) IBM stock for 
period from July 3, 1962, to December 31, 2003. Two horizontal lines denote asymptotic 5% limits. 


components in daily stock volatilities of groups of companies classified by various 
characteristics. They found that companies in the same industrial or business sector 
tend to have more common long-memory components (e.g., big U.S. national banks 
and financial institutions). 


3.14 APPLICATION 


In this section, we apply the volatility models discussed in this chapter to investigate 
some problems of practical importance. The data used are the monthly log returns 
of IBM stock and the S&P 500 index from January 1926 to December 1999. There 
are 888 observations, and the returns are in percentages and include dividends. 
Figure 3.11 shows the time plots of the two return series. Note that the result of 
this section was obtained by the RATS program. 


Example 3.4. The questions we address here are whether the daily volatility 
of a stock is lower in the summer and, if so, by how much. Affirmative answers 
to these two questions have practical implications in stock option pricing. We use 
the monthly log returns of IBM stock shown in Figure 3.11(a) as an illustrative 
example. 
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Figure 3.11 Time plots of monthly log returns for (a) IBM stock and (b) S&P 500 index. Sample 
period is from January 1926 to December 1999. Returns are in percentages and include dividends. 


Denote the monthly log return series by r;. If Gaussian GARCH models are 
entertained, we obtain the GARCH(1,1) model: 


r; = 1.23 + 0.099r;_1 + a;, ay = OE, 
af = 3.206 + 0.103a7_, + 0.82502.,, (3.42) 


for the series. The standard errors of the two parameters in the mean equation 
are 0.222 and 0.037, respectively, whereas those of the parameters in the volatility 
equation are 0.947, 0.021, and 0.037, respectively. Using the standardized residuals 
a; = a;/0;, we obtain Q(10) = 7.82(0.553) and Q(20) = 21.22(0.325), where the 
p value is in parentheses. Therefore, there are no serial correlations in the residuals 
of the mean equation. The Ljung—Box statistics of the G? series show Q(10) = 
2.89(0.98) and Q(20) = 7.26(0.99), indicating that the standardized residuals have 
no conditional heteroscedasticity. The fitted model seems adequate. This model 
serves as a Starting point for further study. 

To study the summer effect on stock volatility of an asset, we define an indicator 
variable 
1 ift is June, July, or August, 
a | 0 otherwise, (3.43) 
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and modify the volatility equation to 
2_ 2 2 2 2 
Of = do + ayaz_) + b101 + Ur (oo + O&104;_1; + 100/1). 


This equation uses two GARCH(1,1) models to describe the volatility of a stock 
return; one model for the summer months and the other for the remaining months. 
For the monthly log returns of IBM stock, estimation results show that the estimates 
of aio and fio are statistically nonsignificant at the 10% level. Therefore, we refine 
the equation and obtain the model 


re = 1.21 + 0.099r;_| + dt, ay = Of€;, 
of = 4.539 + 0.113a7_, + 0.81602 4 — 5.154u;. (3.44) 


The standard errors of the parameters in the mean equation are 0.218 and 0.037, 
respectively, and those of the parameters in the volatility equation are 1.071, 0.022, 
0.037, and 1.900, respectively. The Ljung—Box statistics for the standardized resid- 
uals a = ar/o, show Q(10) = 7.66(0.569) and Q(20) = 21.64(0.302). Therefore, 
there are no serial correlations in the standardized residuals. The Ljung—Box 
Statistics for a; give Q(10) = 3.38(0.97) and Q(20) = 6.82(0.99), indicating 
no conditional heteroscedasticity in the standardized residuals either. The refined 
model seems adequate. 

Comparing the volatility models in Eqs. (3.42) and (3.44), we obtain the follow- 
ing conclusions. First, because the coefficient —5.514 is significantly different from 
zero with a p value of 0.0067, the summer effect on stock volatility is statistically 
significant at the 1% level. Furthermore, the negative sign of the estimate confirms 
that the volatility of IBM monthly log stock returns is indeed lower during the 
summer. Second, rewrite the volatility model in Eq. (3.44) as 


2 J —0.615 + 0.113a7_, + 0.81607 , if ż is June, July,or August, 
or | 4.539 +0.113a2_,+0.81602, otherwise. 


The negative constant term —0.615 = 4.539 — 5.514 is counterintuitive. However, 
since the standard errors of 4.539 and 5.514 are relatively large, the estimated 
difference —0.615 might not be significantly different from zero. To verify the 
assertion, we refit the model by imposing the constraint that the constant term of 
the volatility equation is zero for the summer months. This can easily be done by 
using the equation 


of = œa? + Biok ty (1 u). 


The fitted model is 


re = 1.21 + 0.099r,_; + q, dt = Of€t, 
of = 0.114a7_, + 0.8110}; + 4.552(1 — u;). (3.45) 
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The standard errors of the parameters in the mean equation are 0.219 and 0.038, 
respectively, and those of the parameters in the volatility equation are 0.022, 0.034, 
and 1.094, respectively. The Ljung—Box statistics of the standardized residuals 
show Q(10) = 7.68 and Q(20) = 21.67, and those of the a? series give Q(10) = 
3.17 and Q(20) = 6.85. These test statistics are close to what we had before and 
are not significant at the 5% level. 

The volatility Eq. (3.45) can readily be used to assess the summer effect on the 
IBM stock volatility. For illustration, based on the model in Eq. (3.45) the medians 
of a? and of are 29.4 and 75.1, respectively, for the IBM monthly log returns 
in 1999. Using these values, we have o7 = 0.114 x 29.4 + 0.811 x 75.1 = 64.3 
for the summer months and g? = 68.8 for the other months. The ratio of the two 
volatilities is 64.3/68.8 ~ 93%. Thus, there is a 7% reduction in the volatility of 
the monthly log return of IBM stock in the summer months. 


Example 3.5. The S&P 500 index is widely used in the derivative markets. 
As such, modeling its volatility is a subject of intensive study. The question we 
ask in this example is whether the past returns of individual components of the 
index contribute to the modeling of the S&P 500 index volatility in the presence 
of its own returns. A thorough investigation on this topic is beyond the scope of 
this chapter, but we use the past returns of IBM stock as explanatory variables to 
address the question. 

The data used are shown in Figure 3.11. Denote by r; the monthly log return 
series of the S&P 500 index. Using the r; series and Gaussian GARCH models, 
we obtain the following special GARCH(2,1) model: 


r, =0.609 +a, a =o, of =0.717+0.147a7_, + 0.8390}. (3.46) 


The standard error of the constant term in the mean equation is 0.138, and those of 
the parameters in the volatility equation are 0.214, 0.021, and 0.017, respectively. 
Based on the standardized residuals a, = a;/o;, we have Q(10) = 11.51(0.32) and 
Q(20) = 23.71(0.26), where the number in parentheses denotes the p value. For 
the a? series, we have Q(10) = 9.42(0.49) and Q(20) = 13.01(0.88). Therefore, 
the model seems adequate at the 5% significance level. 

Next, we evaluate the contributions, if any, of using the past returns of IBM 
stock, which is a component of the S&P 500 index, in modeling the index volatility. 
As a simple illustration, we modify the volatility equation as 


oF = Œo + aa? > + Pie. s + y(%-1 — 124)". 


where x; is the monthly log return of IBM stock and 1.24 is the sample mean of 
xı. The fitted model for r; becomes 


r, = 0.616 + a;, ay = Ofer, 
af = 1.069 + 0.148a?_, + 0.83407, — 0.007(x,-1 — 1.24). (3.47) 
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TABLE 3.4 Fitted Volatilities for Monthly Log Returns of S&P 500 Index from July 
to December 1999 Using Models with and without Past Log Return of IBM Stock 


Month 7199 8/99 9/99 10/99 11/99 12/99 
Model (3.46) 26.30 26.01 24.73 21.69 20.71 22.46 
Model (3.47) 23.32 23.13 22.46 20.00 19.45 18.27 


The standard error of the parameter in the mean equation is 0.139 and the standard 
errors of the parameters in the volatility equation are 0.271, 0.020, 0.018, and 
0.002, respectively. For model checking, we have Q(10) = 11.39(0.33) and Q(20) 
= 23.63(0.26) for the standardized residuals a; = ar/o, and Q(10) = 9.35(0.50) 
and Q(20) = 13.51(0.85) for the a series. Therefore, the model is adequate. 
Since the p value for testing y = 0 is 0.0039, the contribution of the lag-1 
IBM stock return to the S&P 500 index volatility is statistically significant at the 
1% level. The negative sign is understandable because it implies that using the 
lag-1 past return of IBM stock reduces the volatility of the S&P 500 index return. 
Table 3.4 gives the fitted volatility of the S&P 500 index from July to December 
of 1999 using models (3.46) and (3.47). From the table, the past value of IBM log 
stock return indeed contributes to the modeling of the S&P 500 index volatility. 


3.15 ALTERNATIVE APPROACHES 


In this section, we discuss two alternative methods to volatility modeling. 


3.15.1 Use of High-Frequency Data 


French, Schwert, and Stambaugh (1987) consider an alternative approach for 
volatility estimation that uses high-frequency data to calculate volatility of 
low-frequency returns. In recent years, this approach has attracted substantial 
interest due to the availability of high-frequency financial data; see Andersen, 
Bollerslev, Diebold, and Labys (2001a, 2001b). 

Suppose that we are interested in the monthly volatility of an asset for which 
daily returns are available. Let r? be the monthly log return of the asset at month 
t. Assume that there are n trading days in month ¢ and the daily log returns of the 
asset in the month are {r;,;}?_,. Using properties of log returns, we have 


n 
m 
i = ) Ft i. 
i=l 
Assuming that the conditional variance and covariance exist, we have 


Var(r}"|Fi—1) = J Var(r;,i|Fi—1) +2 9) Covi. nA) F1], (3.48) 


i=l i<j 
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where F;_; denotes the information available at month t — 1 (inclusive). The prior 
equation can be simplified if additional assumptions are made. For example, if we 
assume that {r;;} is a white noise series, then 


Var(r)"|F;—-1) = nVar(r;.1), 
where Var(r;,;) can be estimated from the daily returns {r;,;}?_, by 


=\2 
yo Dhan - 4) 


’ 


n—1 


where r; is the sample mean of the daily log returns in month ¢ [Le., 
Fr = ($; rni)/n]. The estimated monthly volatility is then 


n ne i 
on = Yru- Fe). (3.49) 
i=l 


n— l4 
If {r; i} follows an MA(1) model, then 
Var(r”|Fi—1) = nVar(r: 1) + 2(n — 1)Cov (r1, 71,2), 


which can be estimated by 


n n—1 
A n = = = 
om = -L 2s =P +20 Fri 7). (68-50) 


i=l 


The previous approach for volatility estimation is simple, but it encounters several 
difficulties in practice. First, the model for daily returns {7;,;} is unknown. This 
complicates the estimation of covariances in Eq. (3.48). Second, there are roughly 
21 trading days in a month, resulting in a small sample size. The accuracy of 
the estimates of variance and covariance in Eq. (3.48) might be questionable. The 
accuracy depends on the dynamic structure of {7;,;} and their distribution. If the 
daily log returns have high excess kurtosis and serial correlations, then the sample 
estimates 62 in Eqs. (3.49) and (3.50) may not even be consistent; see Bai, Russell, 


m 


and Tiao (2004). Further research is needed to make this approach valuable. 


Example 3.6. Consider the monthly volatility of the log returns of the S&P 
500 index from January 1980 to December 1999. We calculate the volatility by 
three methods. In the first method, we use daily log returns and Eq. (3.49) (i.e., 
assuming that the daily log returns form a white noise series). The second method 
also uses daily returns but assumes an MA(1) model [i.e., using Eq. (3.50)]. The 
third method applies a GARCH(1,1) model to the monthly returns from January 
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Figure 3.12 Time plots of estimated monthly volatility for log returns of S&P 500 index from January 
1980 to December 1999: (a) assumes that daily log returns form a white noise series, (b) assumes that 
daily log returns follow an MA(1) model, and (c) uses monthly returns from January 1962 to December 
1999 and a GARCH(1,1) model. 


1962 to December 1999. We use a longer data span to obtain a more accurate 
estimate of the monthly volatility. The GARCH(1,1) model used is 


r” =0.658+4;, a =O, of =3.349+0.086a7_, + 0.73502, 


where €; is a standard Gaussian white noise series. Figure 3.12 shows the time plots 
of the estimated monthly volatility. Clearly the estimated volatilities based on daily 
returns are much higher than those based on monthly returns and a GARCH(1,1) 
model. In particular, the estimated volatility for October 1987 was about 680 when 
daily returns are used. The plots shown were truncated to have the same scale. 

In Eq. (3.49), if we further assume that the sample mean r; is zero, then we have 
G2 x a ee In this case, the cumulative sum of squares of daily log returns 
in a month is used as an estimate of monthly volatility. This concept has been 
generalized to estimate daily volatility of an asset by using intradaily log returns. 
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Let r; be the daily log return of an asset. Suppose that there are n equally spaced 
intradaily log returns available such that r; = )~_, rsi. The quantity 


n 


RV, = rfi 


j=l 


is called the realized volatility of r;; see Andersen et al. (2001a,b). Mathematically, 
realized volatility is a quadratic variation of r;, and it assumes that {7;,;}/_, forms 
an iid sequence with mean zero and finite variance. Limited experience indicates 
that In(RV;) often follows approximately a Gaussian ARIMA(0,1,q) model, which 
can be used to produce forecasts. See demonstration in Section 1.1 for further 
information. 

Advantages of realized volatility include simplicity and making use of intradaily 
returns. Intuitively, one would like to use as much information as possible by 
choosing a large n. However, when the time interval between r;, is small, the 
returns are subject to the effects of market microstructure, for example, bid—ask 
bounce, which often results in a biased estimate of the volatility. The problem of 
choosing an optimal time interval for constructing realized volatility has attracted 
much research lately. For heavily traded assets in the United States, a time interval 
of 4—15 minutes is often used. Another problem of using realized volatility for 
stock returns is that the overnight return, which is the return from the closing price 
of day t — | to the opening price of t, tends to be substantial. Ignoring overnight 
returns can seriously underestimate the volatility. On the other hand, our limited 
experience shows that overnight returns appear to be small for index returns or 
foreign exchange returns. 

In a series of recent articles, Barndorff-Nielsen and Shephard (2004) have used 
high-frequency returns to study bi-power variations of an asset return and developed 
some methods to detect jumps in volatility. 


3.15.2 Use of Daily Open, High, Low, and Close Prices 


For many assets, daily opening, high, low, and closing prices are available. Parkin- 
son (1980), Garman and Klass (1980), Rogers and Satchell (1991), and Yang and 
Zhang (2000) showed that one can use such information to improve volatility esti- 
mation. Figure 3.13 shows a time plot of price versus time for the rth trading day, 
assuming that time is continuous. For an asset, define the following variables: 


e C, = closing price of the rth trading day. 

e O, = opening price of the rth trading day. 

e f = fraction of the day (in interval [0,1]) that trading is closed. 
e H, = highest price of the rth trading period. 

e L, = lowest price of the rth trading period. 

e F;_, = public information available at time ¢ — 1. 


ALTERNATIVE APPROACHES 163 


Trading closed Trading open 


35 


Price 
33 34 


32 


31 


C(t-+1) 


30 


0.0 0.2 0.4 f 0.6 0.8 1.0 
Time 


Figure 3.13 Time plot of price over time: scale for price is arbitrary. 


The conventional variance (or volatility) is a? = E[(C; — C;_-1)"|F;-1]. Garman 
and Klass (1980) considered several estimates of o? assuming that the price follows 
a simple diffusion model without drift; see Chapter 6 for more information about 


stochastic diffusion models. The estimators considered include: 


Gay = (C; — C;-1)?. 
(Or — C- (Ci -— 0)? 


e 6? = 55 1. 

i oF omg ee 

ao _ (Hh = LP 2 
e 62, = x 0.3607(H, — L}. 

ôl = gmo) ~ 036070 = Li) 

O, — C1) H, — L)? 
° 62, ee a ee 
Fi A — f)4In(2) 


63, = 0.5(H; — L)? — [21n(2) — 1](C; — O,)?, which is © 0.5(H; — L,)? — 
0.386(C; — O;)*. 
2 


O; — Cy-1)* ô 
è tee : r v + 0.88- a 


A more precise, but complicated, estimator ô? , was also considered. However, it 
is close to fe, Defining the efficiency factor of a volatility estimator as 


i 0< f<l. 


Var(ôg D 


Eff(é2,) = , 
Gi.) Var(G?,) 
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Garman and Klass (1980) found that Eff(67,) is approximately 2, 5.2, 6.2, 7.4, 
and 8.4 for i = 1, 2, 3, 5, and 6, respectively, for the simple diffusion model 
entertained. Note that oF , was derived by Parkinson (1980) with f = 0. 

Turn to log returns. Define the following: 


e o; = In(O;) — In(C;_1), the normalized open. 
e u; = ln(H;) — In(O;), the normalized high. 

e d; = In(L;) — In(O,), the normalized low. 

e c, = In(C,) — In(O,), the normalized close. 


Suppose that there are n days of data available and the volatility is constant over 
the period. Yang and Zhang (2000) recommend the estimate 


62 = 67 + kô? + (1 — k)ô? 


yz o 


as a robust estimator of the volatility, where 


. 1 n E 7 1 n 
ôs = — Der -5 with õ=-} 0r 
t=1 t=1 


1 n 1 n 
i y (c,} —¢)? with é=-) Cr, 
i= n 
t=1 f= 1 


a2 
Oo. = 


a lx 
oe. = ai X [ur(ur — ci) + di(d; — c¢)], 
t=1 


pa 0.34 
~ 1.344 (n+ 1)/(n — 1) 


The estimate 6, was proposed by Rogers and Satchell (1991), and the quantity 
k is chosen to minimize the variance of the estimator of Os which is a linear 
combination of three estimates. 

The quantity H, — L, is called the range of the price in the tth day. This 
estimator has led to the use of range-based volatility estimates; see, for instance, 
Alizadeh, Brandt, and Diebold (2002). In practice, stock prices are only observed 
at discrete time points. As such, the observed daily high is likely lower than H, 
and the observed daily low is likely higher than L,. Consequently, the observed 
daily price range tends to underestimate the actual range and, hence, may lead 
to underestimation of volatility. This bias in volatility estimation depends on the 
trading frequency and tick size of the stocks. For intensively traded stocks, the bias 
should be negligible. For other stocks, further study is needed to better understand 
the performance of range-based volatility estimation. 
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3.16 KURTOSIS OF GARCH MODELS 


Uncertainty in volatility estimation is an important issue, but it is often over- 
looked. To assess the variability of an estimated volatility, one must consider the 
kurtosis of a volatility model. In this section, we derive the excess kurtosis of a 
GARCH(1,1) model. The same idea applies to other GARCH models, however. 
The model considered is 


ar = Ofer, oF = œo t+aja?_, + Bio?,, 
where a9 > 0, a; > 0, By > 0, a; + 6; < 1, and {€;} is an iid sequence satisfying 
E@)=0,  Var(e)=1, E@)=Ke +3, 


where Ke is the excess kurtosis of the innovation €,. Based on the assumption, we 
have the following: 


e Var(a;) = E(o7) = a9/[1 — (a + Ai)). 
© E(a}) = (Ke + 3)E (0f) provided that E (of) exists. 


Taking the square of the volatility model, we have 
4 2 204 24 2 2 2 2 
o, = a5 + aja,_,; + Bio, + 2apaiay_, + 2a0Bi0/_, + 201 Bi 0/_\a;_). 


Taking expectation of the equation and using the two properties mentioned earlier, 
we obtain 


2 
il 
E) _ ato ( ag + Bi) = 
[1 — (a1 + EDJ — æi (Ke +2) — (œ + B1)7] 
provided that 1>a,+ 6; > 0 and 1 — a? (Ke +2) — (a, + BD > 0. The excess 
kurtosis of az, if it exists, is then 


_ Ela) 


[E (af)? 


(Ke + 311 — 1 +6] E 
1 — 2a? — (a + Bi)? — Keo? 
This excess kurtosis can be written in an informative expression. First, consider 
the case that e; is normally distributed. In this case, Ke = 0, and some algebra 
shows that 


2 
K® = Om 
“1 2ay -— @ + B1)? 
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where the superscript (g) is used to denote Gaussian distribution. This result has 
two important implications: (a) the kurtosis of a; exists if 1 — 2a? — (a; + Bi)? >0, 
and (b) if a; = 0, then KE = 0, meaning that the corresponding GARCH(1,1) 
model does not have heavy tails. 

Second, consider the case that €, is not Gaussian. Using the prior result, we 
have 


Ke — Ke(œ1 + Bi) + 6a? + 3Kea? 
1— ay — (a, + 1)? — Kea? 
Ke[1 — 2af — (a1 + b1) ] + 607 + 5Keay 
1— 2a? — (a, + 61)? — Kaa? 


Kı = 


Ke + KË + 3K KP 
L= iK KP 


This result was obtained originally by George C. Tiao; see Bai, Russell, and Tiao 
(2003). It holds for all GARCH models provided that the kurtosis exists. For 
instance, if 6; = 0, then the model reduces to an ARCH(1) model. In this case, it 
is easy to verify that Ki = 607 /(1 — 3a?) provided that 1 > 3a? and the excess 
kurtosis of a; is 


ee (Ke + 3)(1 — a?) Le Ke + 2K.07 + 607 
“1 = (Ke + 3)0? 1 — 3a? — Kea? 
Kd — 3a) + 6a? + 5Kea? 
1- 3a? — Kea? 


Ke + Ka? + 3K Ka” 
1- iK KO 


The prior result shows that for a GARCH(1,1) model the coefficient a; plays 
a critical role in determining the tail behavior of a;. If a; = 0, then K® =0 
and Ka = Ke. In this case, the tail behavior of a; is similar to that of the stan- 
dardized noise «,. Yet if a, >0, then K® >0 and the a, process has heavy 
tails. 

For a (standardized) Student-t distribution with v degrees of freedom, we have 
E(e4) = 6/(v—4) +3 if v>4. Therefore, the excess kurtosis of e; is Ke = 
6/(v — 4) for v > 4. This is part of the reason that we used fs in the chapter when 
the degrees of freedom of a f-distribution are prespecified. The excess kurtosis 
of a; becomes Ka = [6 + (v + 1)K$*’]/[v — 4 — K£] provided that 1 — 2a?(v — 
1)/(v— 4) — (@ + By)? > 0. 
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APPENDIX: SOME RATS PROGRAMS FOR ESTIMATING VOLATILITY 
MODELS 


The data file used in the illustration is sp500.txt, which contains the monthly 
excess returns of the S&P 500 index with 792 observations. Comments in a RATS 


program start with *. 


A Gaussian GARCH(1,1) Model with a Constant Mean Equation 


all 0 792:1 

open data sp500.txt 

data(org=obs) / rt 

xxx initialize the conditional variance function 
set h = 0.0 

*** specify the parameters of the model 
nonlin mu a0 al bl 

*** specify the mean equation 

frml at = rt(t)-mu 

*** specify the volatility equation 
frml gvar = a0O+al*at(t-1) **2+b1*h(t-1) 
*** specify the log likelihood function 


frml garchin = -0.5*log(h(t)=gvar(t))-0.5*at(t)**2/h(t) 
*** sample period used in estimation 
smpl 2 792 


*** initial estimates 

compute a0 = 0.01, al = 0.1, bl = 0.5, mu = 0.1 
maximize (method=bhhh, recursive, iterations=150) garchin 
set fv = gvar(t) 

set resid = at(t)/sqrt(fv(t)) 

set residsgq = resid(t) *resid(t) 

*** Checking standardized residuals 

cor (qstats,number=20,span=10) resid 

*** Checking squared standardized residuals 

cor (qstats,number=20,span=10) residsq 


A GARCH(1,1) Model with Student-t Innovation 


all 0 792:1 

open data sp500.txt 

data(org=obs) / rt 

set h = 0.0 

nonlin mu a0 al bl v 

frml at = rt(t)-mu 

frml gvar = a0+al*at(t-1) **2+b1*h(t-1) 

frml tt = at(t)**2/(h(t)=gvar(t) ) 

frml tln = SLNGAMMA( (v+1)/2.)-SLNGAMMA (v/2.)-0.5*log(v-2.) 


frml gln = tln-((v+1)/2.)*log(1.0+tt(t)/(v-2.0))-0.5*log (h(t) ) 


smpl 2 792 
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compute a0 = 0.01, al = 0.1, bl = 0.5, mu = 0.1, v = 10 
maximize (method=bhhh, recursive,iterations=150) gln 

set fv = gvar(t) 

set resid = at(t)/sqrt(fv(t)) 

set residsq = resid(t) *resid(t) 

cor (qstats,number=20,span=10) resid 

cor (qstats,number=20,span=10) residsq 


An AR(1)—EGARCH (1,1) Model for Monthly Log Returns of IBM Stock 


all 0 864:1 

open data m-ibm.txt 

data(org=obs) / rt 

set h = 0.0 

nonlin c0 p1 th ga a0 al 

frml at = rt(t)-c0-pl*rt(t-1) 
frml epsi = at(t)/(sqrt(exp(h(t) ) 
frml g = th*epsi(t)+ga* (abs (epsi ( 
frml gvar = al*h(t-1)+(1-al) *a0+g 
frml garchin = -0.5* (h(t) =gvar(t) 
smpl 3 864 
compute c0O = 0.01, pl = 0.01, th = 0.1, ga = 0.1 
compute a0 = 0.01, al = 0.5 

maximize (method=bhhh, recursive,iterations=150) garchin 
set fv = gvar(t) 
set resid = epsi(t) 

set residsq = resid(t) *resid(t) 

cor (qstats,number=20,span=10) resid 
cor (qstats,number=20,span=10) residsq 


) 
))-sqrt(2./% PI)) 
t=1) 


) 
t 
( 
)-0.5*epsi(t)**2 


EXERCISES 


3.1. Derive multistep-ahead forecasts for a GARCH(1,2) model at the forecast 


origin A. 

3.2. Derive multistep-ahead forecasts for a GARCH(2,1) model at the forecast 
origin h. 

3.3. Suppose that 7;,...,7, are observations of a return series that follows the 


AR(1)-GARCH(1,1) model 
r= U + Qir- + ar, dt = Oret, oF = æo + aa? + Biok, 


where €, is a standard Gaussian white noise series. Derive the conditional 
log-likelihood function of the data. 

3.4. In the equation in Exercise 3.3, assume that €, follows a standardized Student- 
t distribution with v degrees of freedom. Derive the conditional log-likelihood 
function of the data. 
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3:5: 


3.6. 


Sets 


3.8. 


Consider the monthly simple returns of Intel stock from January 1973 to 
December 2008 in m-intc7308.txt. Transform the returns into log returns. 
Build a GARCH model for the transformed series and compute 1|-step- to 
5-step-ahead volatility forecasts at the forecast origin December 2008. 

The file m-mrk4608.txt contains monthly simple returns of Merck stock 
from June 1946 to December 2008. The file has two columns denoting date 
and simple return. Transform the simple returns to log returns. 


(a) Is there any evidence of serial correlations in the log returns? Use auto- 
correlations and 5% significance level to answer the question. If yes, 
remove the serial correlations. 

(b) Is there any evidence of ARCH effects in the log returns? Use the resid- 
ual series if there are serial correlations in part (a). Use Ljung—Box 
statistics for the squared returns (or residuals) with 6 and 12 lags of 
autocorrelations and 5% significance level to answer the question. 

(c) Identify an ARCH model for the data and fit the identified model. Write 
down the fitted model. 


The file m-3m4608.txt contains two columns. They are date and the 
monthly simple return for 3M stock. Transform the returns to log returns. 


(a) Is there any evidence of ARCH effects in the log returns? Use Ljung—Box 
statistics with 6 and 12 lags of autocorrelations and 5% significance level 
to answer the question. 

(b) Use the PACF of the squared returns to identify an ARCH model. What 
is the fitted model? 

(c) There are 755 data points. Refit the model using the first 750 observations 
and use the fitted model to predict the volatilities for t from 751 to 755 
(the forecast origin is 750). 

(d) Build an ARCH-M model for the log return series of 3M stock. Test the 
hypothesis that the risk premium is zero at the 5% significance level. 
Draw your conclusion. 

(e) Build an EGARCH model for the log return series of 3M stock using 
the first 750 observations. Use the fitted model to compute 1-step- to 
5-step-ahead volatility forecasts at the forecast origin h = 750. 


The file m-gmsp5008.txt contains the dates and monthly simple returns of 
General Motors stock and the S&P 500 index from 1950 to 2008. 


(a) Build a GARCH model with Gaussian innovations for the log returns of 
GM stock. Check the model and write down the fitted model. 


(b) Build a GARCH-M model with Gaussian innovations for the log returns 
of GM stock. What is the fitted model? 


(c) Build a GARCH model with Student-r distribution for the log returns 
of GM stock, including estimation of the degrees of freedom. Write 
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3.10. 


3.12. 
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down the fitted model. Let v be the degrees of freedom of the Student-t 
distribution. Test the hypothesis Hp : v = 6 versus H, : v Æ 6, using the 
5% significance level. 

(d) Build an EGARCH model for the log returns of GM stock. What is the 
fitted model? 

(e) Obtain l-step- to 6-step-ahead volatility forecasts for all the models 
obtained. Compare the forecasts. 

Consider the monthly log returns of GM stock in m-gmsp5008.txt. Build 

an adequate TGARCH model for the series. Write down the fitted model and 

test for the significance of the leverage effect. Obtain 1-step- to 6-steps-ahead 

volatility forecasts. 

Again, consider the returns in m-gmsp5008.txt. 


(a) Build a Gaussian GARCH model for the monthly log returns of the S&P 
500 index. Check the model carefully. 

(b) Is there a summer effect on the volatility of the index return? Use the 
GARCH model built in part (a) to answer this question. 

(c) Are lagged returns of GM stock useful in modeling the index volatil- 
ity? Again, use the GARCH model of part (a) as a baseline model for 
comparison. 


. The file d-gmsp9908.txt contains the daily simple returns of GM stock and 


the S&P composite index from 1999 to 2008. It has three columns denoting 
date, GM return, and S&P return. 


(a) Compute the daily log returns of GM stock. Is there any evidence of 
ARCH effects in the log returns? You may use 10 lags of the squared 
returns and 5% significance level to perform the test. 

(b) Compute the PACF of the squared log returns (10 lags). 

(c) Specify a GARCH model for the GM log return using a normal distri- 
bution for the innovations. Perform model checking and write down the 
fitted model. 

(d) Find an adequate GARCH model for the series but using the generalized 
error distribution for the innovations. Write down the fitted model. 

Consider the daily simple returns of the S&P composite index in the file 

d-gmsp9908.txt. 


(a) Is there any ARCH effect in the simple return series? Use 10 lags of the 
squared returns and 5% significance level to perform the test. 


(b) Build an adequate GARCH model for the simple return series. 


(c) Compute |-step- to 4-step-ahead forecasts of the simple return and its 
volatility based on the fitted model. 


. Again, consider the daily simple returns of GM stock in the file 


d-gmsp9908.txt. 
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(a) Find an adequate GARCH-M model for the series. Write down the fitted 
model. 
(b) Find an adequate EGARCH model for the series. Is the “leverage” effect 
significant at the 5% level? 
3.14. Revisit the file d-gmsp9908.txt. However, we shall investigate the value 
of using market volatility in modeling volatility of individual stocks. Convert 
the two simple return series into percentage log return series. 


(a) Build an AR(5)—GARCH(1,1) model with generalized error distribution 
for the log S&P returns. The AR(5) contains only lags 3 and 5. Denote 
the fitted volatility series by spvol. 


(b) Estimate a GARCH(1,1) model with spvol as an exogenous variable to 
the log GM return series. Check the adequacy of the model, and write 
down the fitted model. In S-Plus, the command is 


fit = garch(gm ~1, ~garch(1,1)+spvol, cond.dist=’ged’ ) 


(c) Discuss the implication of the fitted model. 

3.15. Again, consider the percentage daily log returns of GM stock and the S&P 
500 index from 1999 to 2008 as before, but we shall investigate whether 
the volatility of GM stock has any contribution in modeling the S&P index 
volatility. Follow the steps below to perform the analysis. 


(a) Fit a GARCH(1,1) model with generalized error distribution to the per- 
centage log returns of GM stock. Denote the fitted volatility by gmvol. 
Build an adequate GARCH model plus gmvol as the exogenous variable 
for the log S&P return series. Write down the fitted model. 

(b) Is the volatility of GM stock returns helpful in modeling the volatility 
of the S&P index returns? Why? 
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CHAPTER 4 


Nonlinear Models 
and Their Applications 


This chapter focuses on nonlinearity in financial data and nonlinear econometric 
models useful in analysis of financial time series. Consider a univariate time series 
Xr, Which, for simplicity, is observed at equally spaced time points. We denote 
the observations by {x;|f = 1,..., T}, where T is the sample size. As stated in 
Chapter 2, a purely stochastic time series x; is said to be linear if it can be written as 


x= a+) Viti, (4.1) 


where u is a constant, y; are real numbers with wo = 1, and {a;} is a sequence of 
independent and identically distributed (iid) random variables with a well-defined 
distribution function. We assume that the distribution of a; is continuous and 
E(a;) = 0. In many cases, we further assume that Var(a;) = o7 or, even stronger, 
that a; is Gaussian. If o? °°, Y? < oo, then x; is weakly stationary (i.e., the first 
two moments of x; are time invariant). The ARMA process of Chapter 2 is linear 
because it has an MA representation in Eq. (4.1). Any stochastic process that does 
not satisfy the condition of Eq. (4.1) is said to be nonlinear. The prior definition 
of nonlinearity is for purely stochastic time series. One may extend the definition 
by allowing the mean of x; to be a linear function of some exogenous variables, 
including the time index and some periodic functions. But such a mean function 
can be handled easily by the methods discussed in Chapter 2, and we do not discuss 
it here. Mathematically, a purely stochastic time series model for x; is a function 
of an iid sequence consisting of the current and past shocks—that is, 


Xt = f (at, U-1,...). (4.2) 
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The linear model in Eq. (4.1) says that f(-) is a linear function of its arguments. 
Any nonlinearity in f(-) results in a nonlinear model. The general nonlinear model 
in Eq. (4.2) is not directly applicable because it contains too many parameters. 

To put nonlinear models available in the literature in a proper perspective, we 
write the model of x; in terms of its conditional moments. Let F;_; be the o 
field generated by available information at time t — 1 (inclusive). Typically, F;—ı 
denotes the collection of linear combinations of elements in {x;—1, x;~2,...} and 
{d;—1, @—2, ...}. The conditional mean and variance of x; given F;_; are 


hi = EQ |Fi-1) = @(Fi-1), of = Var(x| F1) = A(Fi-1), (4.3) 


where g(-) and A(-) are well-defined functions with h(-) > 0. Thus, we restrict the 
model to 


x = gF) + y Al)n 


where €; = a;/o; is a standardized shock (or innovation). For the linear series x; 
in Eq. (4.3), g(-) is a linear function of elements of F,;_; and h(-) = og. The 
development of nonlinear models involves making extensions of the two equations 
in Eq. (4.3). If g(-) is nonlinear, x; is said to be nonlinear in mean. If h(-) is time 
variant, then x; is nonlinear in variance. The conditional heteroscedastic models of 
Chapter 3 are nonlinear in variance because their conditional variances o? evolve 
over time. In fact, except for the GARCH-M models, in which u, depends on of 
and hence also evolves over time, all of the volatility models of Chapter 3 focus 
on modifications or extensions of the conditional variance equation in Eq. (4.3). 
Based on the well-known Wold decomposition, a weakly stationary and purely 
stochastic time series can be expressed as a linear function of uncorrelated shocks. 
For stationary volatility series, these shocks are uncorrelated but dependent. The 
models discussed in this chapter represent another extension to nonlinearity derived 
from modifying the conditional mean equation in Eq. (4.3). 

Many nonlinear time series models have been proposed in the statistical liter- 
ature, such as the bilinear models of Granger and Andersen (1978), the threshold 
autoregressive (TAR) model of Tong (1978), the state-dependent model of Priest- 
ley (1980), and the Markov switching model of Hamilton (1989). The basic idea 
underlying these nonlinear models is to let the conditional mean u, evolve over 
time according to some simple parametric nonlinear function. Recently, a number 
of nonlinear models have been proposed by making use of advances in comput- 
ing facilities and computational methods. Examples of such extensions include the 
nonlinear state-space modeling of Carlin, Polson, and Stoffer (1992), the functional 
coefficient autoregressive model of Chen and Tsay (1993a), the nonlinear additive 
autoregressive model of Chen and Tsay (1993b), and the multivariate adaptive 
regression spline of Lewis and Stevens (1991). The basic idea of these extensions 
is either using simulation methods to describe the evolution of the conditional dis- 
tribution of x; or using data-driven methods to explore the nonlinear characteristics 
of a series. Finally, nonparametric and semiparametric methods such as kernel 
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regression and artificial neural networks have also been applied to explore the non- 
linearity in a time series. We discuss some nonlinear models in Section 4.1 that 
are applicable to financial time series. The discussion includes some nonparametric 
and semiparametric methods. 

Apart from the development of various nonlinear models, there is substantial 
interest in studying test statistics that can discriminate linear series from nonlinear 
ones. Both parametric and nonparametric tests are available. Most parametric tests 
employ either the Lagrange multiplier or likelihood ratio statistics. Nonparametric 
tests depend on either higher order spectra of x, or the concept of dimension 
correlation developed for chaotic time series. We review some nonlinearity tests 
in Section 4.2. Sections 4.3 and 4.4 discuss modeling and forecasting of nonlinear 
models. Finally, an application of nonlinear models is given in Section 4.5. 


4.1 NONLINEAR MODELS 


Most nonlinear models developed in the statistical literature focus on the conditional 
mean equation in Eq. (4.3); see Priestley (1988) and Tong (1990) for summaries 
of nonlinear models. Our goal here is to introduce some nonlinear models that are 
applicable to financial time series. 


4.1.1 Bilinear Model 


The linear model in Eq. (4.1) is simply the first-order Taylor series expansion of 
the f(-) function in Eq. (4.2). As such, a natural extension to nonlinearity is to 
employ the second-order terms in the expansion to improve the approximation. 
This is the basic idea of bilinear models, which can be defined as 


p q m s 
x =c Y pixi — Y Ojay +Y Y Bijxr—iar—j + ar, (4.4) 
j=l 


i=j i=1 j=l 


where p,g,m, and s are nonnegative integers. This model was introduced by 
Granger and Andersen (1978) and has been widely investigated. Subba Rao and 
Gabr (1984) discuss some properties and applications of the model, and Liu and 
Brockwell (1988) study general bilinear models. Properties of bilinear models such 
as stationarity conditions are often derived by (a) putting the model in a state- 
space form (see Chapter 11) and (b) using the state transition equation to express 
the state as a product of past innovations and random coefficient vectors. A special 
generalization of the bilinear model in Eq. (4.4) has conditional heteroscedasticity. 
For example, consider the model 


Xp = Ut > Biat—iat + at, (4.5) 


i=l 
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where {ar} is a white noise series. The first two conditional moments of x; are 


5 2 
E(x;|Fi-1) = 4, Var(x;|Fi-1) = (: F Ssa) a, 


i=l 
which are similar to that of the RCA or CHARMA model of Chapter 3. 


Example 4.1. Consider the monthly simple returns of the CRSP equal- 
weighted index from January 1926 to December 2008 for 996 observations. 
Denote the series by R,. The sample PACF of R, shows significant partial 
autocorrelations at lags 1 and 3 so that an AR(3) model is used. The squared 
series of the AR(3) residuals suggests that the conditional heteroscedasticity might 
depend on lags 1, 3, and 8 of the residuals. Therefore, we employ the special 
bilinear model 


R, = u + h1 Ri-1 + O3Ri-3 + (1 + biar + B3ay—3) ay 


for the series, where a; = Boe; with €; being an iid series with mean zero and 
variance 1. Note that lag 8 is omitted for simplicity. Assuming that the conditional 
distribution of a; is normal, we use the conditional maximum-likelihood method 
and obtain the fitted model 


R, = 0.0114 + 0.167R;_1 — 0.095R,_3 
+ 0.071(1 + 0.377a;_1 — 0.646a;_3)€;, (4.6) 


where the standard errors of the parameters are, in the order of appearance, 0.0023, 
0.032, 0.027, 0.002, 0.147, and 0.136, respectively. All estimates are significantly 
different from zero at the 5% level. Define 


a — Rr = 0.0114 — 0.1671 + 0.09523 
‘0.0711 + 0.3774;_1 — 0.6464;_3) 


where ê, = 0 for t < 3, as the standardized residual series of the model. The sample 
ACF of ê, shows no significant serial correlations, but the series is not independent 
because the squared series ê? has significant serial correlations. The validity of 
model (4.6) deserves further investigation. For comparison, we also consider an 


AR(3)—ARCH(3) model for the series and obtain 


R, = 0.013 + 0.223R;_1 + 0.006R,—2 — 0.013R;_3 + ar, 


i : : , (4.7) 
o? = 0.002 + 0.185a2_, + 0.301a7_, + 0.197a?_3, 


where all estimates but the coefficients of R,;-2 and R;_3 are highly significant. 
The standardized residual series of the model shows no serial correlations, but the 
squared residuals show Q(10) = 19.78 with a p value of 0.031. Models (4.6) and 
(4.7) appear to be similar, but the latter seems to fit the data better. Further study 
shows that an AR(1)—GARCH(1,1) model fits the data well. 
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4.1.2 Threshold Autoregressive (TAR) Model 


This model is motivated by several nonlinear characteristics commonly observed 
in practice such as asymmetry in declining and rising patterns of a process. It uses 
piecewise linear models to obtain a better approximation of the conditional mean 
equation. However, in contrast to the traditional piecewise linear model that allows 
for model changes to occur in the “time” space, the TAR model uses threshold 
space to improve linear approximation. Let us start with a simple 2-regime AR(1) 
model: 


—1.5x;-1 + a if Xt-1 < 0, 


= ; (4.8) 
0.5x;-1 + ar if xi > 0, 


Xt 


where the a; are iid N (0, 1). Here the threshold variable is x;-; so that the delay 
is 1, and the threshold is 0. Figure 4.1 shows the time plot of a simulated series 
of x; with 200 observations. A horizontal line of zero is added to the plot, which 
illustrates several characteristics of TAR models. First, despite the coefficient —1.5 
in the first regime, the process x; is geometrically ergodic and stationary. In fact, 
the necessary and sufficient condition for model (4.8) to be geometrically ergodic 
is gi” <1, pP” < 1, and gre < 1, where go is the AR coefficient of regime 
i; see Petruccelli and Woolford (1984) and Chen and Tsay (1991). Ergodicity is 
an important concept in time series analysis. For example, the statistical theory 
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Figure 4.1 Time plot of simulated 2-regime TAR(1) series. 
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showing that the sample mean x = SL 1 Xr)/T of x; converges to the mean of x; 
is referred to as the ergodic theorem, which can be regarded as the counterpart of 
the central limit theory for the iid case. Second, the series exhibits an asymmetric 
increasing and decreasing pattern. If x,_; is negative, then x, tends to switch to a 
positive value due to the negative and explosive coefficient —1.5. Yet when x;_, is 
positive, it tends to take multiple time indexes for x, to reduce to a negative value. 
Consequently, the time plot of x; shows that regime 2 has more observations than 
regime 1, and the series contains large upward jumps when it becomes negative. 
The series is therefore not time reversible. Third, the model contains no constant 
terms, but E(x;) is not zero. The sample mean of the particular realization is 
0.61 with a standard deviation of 0.07. In general, E(x;) is a weighted average 
of the conditional means of the two regimes, which are nonzero. The weight for 
each regime is simply the probability that x, is in that regime under its stationary 
distribution. It is also clear from the discussion that, for a TAR model to have 
zero mean, nonzero constant terms in some of the regimes are needed. This is very 
different from a stationary linear model for which a nonzero constant implies that 
the mean of x, is not zero. 

A time series x; is said to follow a k-regime self-exciting TAR (SETAR) model 
with threshold variable x;_, if it satisfies 


x= 9) tO m1 oap tay, if Yj SMa <y 49) 
where k and d are positive integers, j = 1,...,k, yi are real numbers such that 
= = yo < Yı <- < Ye-1 < Ye = ©, the superscript (j) is used to signify the 


regime, and faf )) are iid sequences with mean 0 and variance o? and are mutually 


independent for different j. The parameter d is referred to as the delay parameter 
and y; are the thresholds. Here it is understood that the AR models are different 
for different regimes; otherwise, the number of regimes can be reduced. Equation 
(4.9) says that a SETAR model is a piecewise linear AR model in the threshold 
space. It is similar in spirit to the usual piecewise linear models in regression 
analysis, where model changes occur in the order in which observations are taken. 
The SETAR model is nonlinear provided that k > 1. 

Properties of general SETAR models are hard to obtain, but some of them 
can be found in Tong (1990), Chan (1993), Chan and Tsay (1998), and the refer- 
ences therein. In recent years, there is increasing interest in TAR models and their 
applications; see, for instance, Hansen (1997), Tsay (1998), and Montgomery et 
al. (1998). Tsay (1989) proposed a testing and modeling procedure for univariate 
SETAR models. The model in Eq. (4.9) can be generalized by using a threshold 
variable z; that is measurable with respect to F;—ı (i.e., a function of elements of 
F;—1). The main requirements are that z; is stationary with a continuous distribution 
function over a compact subset of the real line and that z;-g is known at time t. 
Such a generalized model is referred to as an open-loop TAR model. 


Example 4.2. To demonstrate the application of TAR models, consider the 
U.S. monthly civilian unemployment rate, seasonally adjusted and measured in 
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Figure 4.2 Time plot of monthly U.S. civilian unemployment rate, seasonally adjusted, from January 
1948 to March 2009. 


percentage, from January 1948 to March 2009 for 735 observations. The data 
are obtained from the Bureau of Labor Statistics, Department of Labor, and are 
shown in Figure 4.2. The plot shows two main characteristics of the data. First, 
there appears to be a slow but upward trend in the overall unemployment rate. 
Second, the unemployment rate tends to increase rapidly and decrease slowly. 
Thus, the series is not time reversible and may not be unit-root stationary, 
either. 

Because the sample autocorrelation function decays slowly, we employ the first 
differenced series y, = (1 — B)u; in the analysis, where u; is the monthly unem- 
ployment rate. Using univariate ARIMA models, we obtain the model 


(1 — 1.13B + 0.27B7)(1 — 0.51B"”)y, = (1 — 1.12B + 0.44B7)(1 — 0.82B!”)a;,, 

(4.10) 
where 6, = 0.187 and all estimates but the AR(2) coefficient are statistically sig- 
nificant at the 5% level. The ¢ ratio of the estimate of AR(2) coefficient is — 1.66. 
The residuals of model (4.10) give Q(12) = 12.3 and Q(24) = 25.5, respectively. 
The corresponding p values are 0.056 and 0.11, respectively, based on x? dis- 
tributions with 6 and 18 degrees of freedom. Thus, the fitted model adequately 
describes the serial dependence of the data. Note that the seasonal AR and MA 
coefficients are highly significant with standard error 0.049 and 0.035, respec- 
tively, even though the data were seasonally adjusted. The adequacy of seasonal 
adjustment deserves further study. Using model (4.10), we obtain the 1-step-ahead 
forecast of 8.8 for the April 2009 unemployment rate, which is close to the actual 
data of 8.9. 
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To model nonlinearity in the data, we employ TAR models and obtain the 
model 


z 0.083 y;-2 + 0.158y;-3 + 0.118y;—4 — 0.180y;-12 + aıt if yı < 0.1, 
~ 10.421 y,_2 + 0.239y,_3 — 0.127y,_12 + ax if y,_) > 0.1, 
(4.11) 
where the standard errors of aj; are 0.180 and 0.217, respectively, the standard 
errors of the AR parameters in regime 1 are 0.046, 0.043, 0.042, and 0.037, whereas 
those of the AR parameters in regime 2 are 0.054, 0.057, and 0.075, respectively. 
The number of data points in regimes 1 and 2 are 460 and 262, respectively. The 
standardized residuals of model (4.11) only shows some minor serial correlation at 
lag 12. Based on the fitted TAR model, the dynamic dependence in the data appears 
to be stronger when the change in monthly unemployment rate is greater than 
0.1%. This is understandable because a substantial increase in the unemployment 
rate is indicative of weakening in the U.S. economy, and policy makers might be 
more inclined to take action to help the economy, which in turn may affect the 
dynamics of the unemployment rate series. Consequently, model (4.11) is capable 
of describing the time-varying dynamics of the U.S. unemployment rate. 
The MA representation of model (4.10) is 


Yt 


WY(B) ~ 1+ 0.01B + 0.18B7 + 0.20B? + 0.18B4 + 0.15B>+---. 


It is then not surprising to see that no y;_; term appears in model (4.11). 

As mentioned in Chapter 3, threshold models can be used in finance to handle 
the asymmetric responses in volatility between positive and negative returns. The 
models can also be used to study arbitrage tradings in index futures and cash prices; 
see Chapter 8 on multivariate time series analysis. Here we focus on volatility 
modeling and introduce an alternative approach to parameterization of TGARCH 
models. In some applications, this new general TGARCH model fares better than 
the GJR model of Chapter 3. 


Example 4.3. Consider the daily log returns, in percentage and including 
dividends, of IBM stock from July 3, 1962, to December 31, 2003, for 10,446 
observations. Figure 4.3 shows the time plot of the series, which is one of the 
longer return series analyzed in the book. The volatility seems to be larger in the 
latter years of the data. Because general TGARCH models are used in the analysis, 
we use the SCA package to perform estimation in this example. 

If GARCH models of Chapter 3 are entertained, we obtain the following 
AR(2)—GARCH(1,1) model for the series: 


rte = 0.062 — 0.024r;—2 Für ay = Ores, 
af = 0.037 + 0.077a7_, + 0.91302, (4.12) 


where r; is the log return, {¢;} is a Gaussian white noise sequence with mean 
zero and variance 1.0, the standard errors of the parameters in the mean equation 
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Figure 4.3 Time plot of daily log returns for IBM stock from July 3, 1962, to December 31, 2003. 


are 0.015 and 0.010, and those of the volatility equation are 0.004, 0.003, and 
0.003, respectively. All estimates are statistically significant at the 5% level. 
The Ljung—Box statistics of the standardized residuals give Q(10) = 5.19(0.82) 
and Q(20) = 24.38(0.18), where the number in parentheses denotes the p value 
obtained using XZ distribution because of the estimated AR(2) coefficient. 
For the squared standardized residuals, we obtain Q(10) = 11.67(0.31) and 
Q (20) = 18.25(0.57). The model is adequate in modeling the serial dependence 
and conditional heteroscedasticity of the data. But the unconditional mean for 
r, of model (4.12) is 0.060, which is substantially larger than the sample mean 
0.039, indicating that the model might be misspecified. 
Next, we employ the TGARCH model of Chapter 3 and obtain 


rt = 0.014 — 0.028r;—2 + ar, ay = OtEt, 
af = 0.075 + 0.081 P,—1a7_ı + 0.157N;-1a7_, + 0.86307 |, (4.13) 


where P;_; = 1 — N;_1, N;_1 is the indicator for negative a,_; such that N,_; = 1 
if a; < 0 and = 0 otherwise, the standard errors of the parameters in the mean 
equation are 0.013 and 0.009, and those of the volatility equation are 0.007, 
0.008, 0.010, and 0.010, respectively. All estimates except the constant term of 
the mean equation are significant. Let a; be the standardized residuals of model 
(4.13). We have Q(10) = 2.47(0.98) and Q(20) = 25.90(0.13) for the {a;} series 
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and Q(10) = 97.07(0.00) and Q(20) = 170.3(0.00) for {a7}. The model fails to 
describe the conditional heteroscedasticity of the data. 

The idea of TAR models can be used to refine the prior TGARCH model by 
allowing for increased flexibility in modeling the asymmetric response in volatility. 
More specifically, we consider an AR(2)—TAR—GARCH(1,1) model for the series 
and obtain 


re = 0.033 = 0.0237r;_2 + dt, dt = OfEt, 
af = 0.075 + 0.041a?_, +0.90307_, 
+ (0.030a?_, + 0.06207 ,)Ni-1, (4.14) 


where N,—; is defined in Eq. (4.13). All estimates in model (4.14) are significantly 
different from zero at the usual 1% level. Let â, be the standardized residuals of 
model (4.14). We obtain Q(10) = 6.09(0.73) and Q(20) = 25.29(0.15) for {a} 
and Q(10) = 13.54(0.20) and Q(20) = 19.56(0.49) for {a7}. Thus, model (4.14) 
is adequate in modeling the serial correlation and conditional heteroscedasticity 
of the daily log returns of IBM stock considered. The unconditional mean return 
of model (4.14) is 0.033, which is much closer to the sample mean 0.039 than 
those implied by models (4.12) and (4.13). Comparing the two fitted TGARCH 
models, we see that the asymmetric behavior in daily IBM stock volatility is much 
stronger than what is allowed in a GJR model. Specifically, the coefficient of oe. I 
also depends on the sign of a;—1. Note that model (4.14) can be further refined by 
imposing the constraint that the sum of the coefficients of an ,; and OF. , is one 
when arı < 0. 


Remark. A RATS program to estimate the AR(2)—TAR—GARCH(1,1) model 
used is given in Appendix A. The results might be slightly different from those of 
SCA given in the text. 


4.1.3 Smooth Transition AR (STAR) Model 


A criticism of the SETAR model is that its conditional mean equation is not con- 
tinuous. The thresholds {y;} are the discontinuity points of the conditional mean 
function u+. In response to this criticism, smooth TAR models have been proposed; 
see Chan and Tong (1986) and Terasvirta (1994) and the references therein. A time 
series x; follows a 2-regime STAR(p) model if it satisfies 


p p 
Xt—-d — ^ 
Xt = Co + X boiei +F (==) (e + Youn) + at, (4.15) 


i=l i=l 


where d is the delay parameter, A and s are parameters representing the location and 
scale of model transition, and F (-) is a smooth transition function. In practice, F (-) 
often assumes one of three forms—namely, logistic, exponential, or a cumulative 
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distribution function. From Eq. (4.15) and with 0 < F(-) < 1, the conditional mean 
of a STAR model is a weighted linear combination between the following two 
equations: 


p 
Hir = co + D P0,iXt-i» 


i=l 


p 
Ua = (co + c1) + X (o. + 1) Xi. 


i=l 


The weights are determined in a continuous manner by F[(x;~¢ — A)/s]. The 
prior two equations also determine properties of a STAR model. For instance, a 
prerequisite for the stationarity of a STAR model is that all zeros of both AR 
polynomials are outside the unit circle. An advantage of the STAR model over 
the TAR model is that the conditional mean function is differentiable. However, 
experience shows that the transition parameters A and s of a STAR model are hard 
to estimate. In particular, most empirical studies show that standard errors of the 
estimates of A and s are often quite large, resulting in ¢ ratios of about 1.0; see 
Terasvirta (1994). This uncertainty leads to various complications in interpreting 
an estimated STAR model. 


Example 4.4. To illustrate the application of STAR models in financial time 
series analysis, we consider the monthly simple stock returns for Minnesota Mining 
and Manufacturing (3M) Company from February 1946 to December 2008. If 
ARCH models are entertained, we obtain the following ARCH(2) model: 


R, = 0.013 +a, a=o€, 67 =0.003+0.088a7_,+0.109a7_,, 
(4.16) 
where standard errors of the estimates are 0.002, 0.0003, 0.047, and 0.050, respec- 
tively. As discussed before, such an ARCH model fails to show the asymmetric 
responses of stock volatility to positive and negative prior shocks. The STAR model 
provides a simple alternative that may overcome this difficulty. Applying STAR 
models to the monthly returns of 3M stock, we obtain the model 


R; = 0.015 + a;, at = Ort, 


0.001 — 0.239a2_, 


of = (0.003 + 0.205a?_, + 0.092a?_,) + Tee 100004) 
Z 3 


(4.17) 


where the standard error of the constant term in the mean equation is 0.002 and the 
standard errors of the estimates in the volatility equation are 0.0002, 0.074, 0.043, 
0.0004, and 0.080, respectively. The scale parameter 1000 of the logistic transition 
function is fixed a priori to simplify the estimation. This STAR model provides 
some support for asymmetric responses to positive and negative prior shocks. For 
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a large negative a;—1, the volatility model approaches the ARCH(2) model 
a = 0.003 + 0.205a?_, + 0.092a? ,. 

Yet for a large positive a;—1, the volatility process behaves like the ARCH(2) model 
af = 0.004 — 0.034a?_, + 0.092a?_,. 


The negative coefficient of a? ı in the prior model is counterintuitive, but the 
magnitude is small. As a matter of fact, for a large positive shock a,_;, the ARCH 
effects appear to be weak even though the parameter estimates remain statistically 
significant. The results shown are obtained using the command optim in R. A 
RATS program for estimating the STAR model is given in Appendix A. 


R Program for Estimating the STAR Model Used 


da=read.table("m-3m4608.txt",header=T) 
rtn=da[,2] 

source("Star.R") 

par=c(.001,.002,.256, .141, .002,-.314) 
m2=optim(par,star,method=c("BFGS") ,hessian=T) 


VVVV Vv 


# function to calculate the likelihood of a STAR model. 
star <- function(par) { 

£= 0 

1=length(rtn) 

(1,1) 

t=c (0,0) 

or (t in 3:T1){ 

esi = rtn[t]-par[1] 

t=c(at,resi) 
ig=par[2]+par[3]*at[t-1]*2+par[4]*at[t-2]%2 
igl=par [5]+par[6]*at[t-1]%*2 

t=sqrt (sig+sigl/ (1+exp(-1000*at[t-1]))) 
h=c(h, tt) 

x=resi/tt 

f=f+log(tt)+0.5*x*x 

} 

f 

} 


tonw y HDA PA 


4.1.4 Markov Switching Model 


The idea of using probability switching in nonlinear time series analysis is discussed 
in Tong (1983). Using a similar idea, but emphasizing aperiodic transition between 
various states of an economy, Hamilton (1989) considers the Markov switching 
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autoregressive (MSA) model. Here the transition is driven by a hidden two-state 
Markov chain. A time series x, follows an MSA model if it satisfies 


ea) Se me Ee) (4.18) 
c2 + J oia Q2iXri Hax if s, = 2, 


where s; assumes values in {1,2} and is a first-order Markov chain with transition 
probabilities 


P(s; = 2|5,-1 = 1) = w, P(s; = 1\s;-1 = 2) = w2. 


The innovational series {a;ı;} and {a2;} are sequences of iid random variables with 
mean zero and finite variance and are independent of each other. A small w; 
means that the model tends to stay longer in state i. In fact, 1/w; is the expected 
duration of the process to stay in state 7. From the definition, an MSA model uses a 
hidden Markov chain to govern the transition from one conditional mean function 
to another. This is different from that of a SETAR model for which the transition is 
determined by a particular lagged variable. Consequently, a SETAR model uses a 
deterministic scheme to govern the model transition, whereas an MSA model uses a 
stochastic scheme. In practice, the stochastic nature of the states implies that one is 
never certain about which state x; belongs to in an MSA model. When the sample 
size is large, one can use some filtering techniques to draw inference on the state of 
xı. Yet as long as x;-q is observed, the regime of x; is known in a SETAR model. 
This difference has important practical implications in forecasting. For instance, 
forecasts of an MSA model are always a linear combination of forecasts produced 
by submodels of individual states. But those of a SETAR model only come from 
a single regime provided that x,_g is observed. Forecasts of a SETAR model also 
become a linear combination of those produced by models of individual regimes 
when the forecast horizon exceeds the delay d. It is much harder to estimate 
an MSA model than other models because the states are not directly observable. 
Hamilton (1990) uses the EM algorithm, which is a statistical method iterating 
between taking expectation and maximization. McCulloch and Tsay (1994) consider 
a Markov chain Monte Carlo (MCMC) method to estimate a general MSA model. 
We discuss MCMC methods in Chapter 12. 

McCulloch and Tsay (1993) generalize the MSA model in Eq. (4.18) by let- 
ting the transition probabilities w; and w2 be logistic, or probit, functions of some 
explanatory variables available at time t — 1. Chen, McCulloch, and Tsay (1997) 
use the idea of Markov switching as a tool to perform model comparison and selec- 
tion between nonnested nonlinear time series models (e.g., comparing bilinear and 
SETAR models). Each competing model is represented by a state. This approach 
to select a model is a generalization of the odds ratio commonly used in Bayesian 
analysis. Finally, the MSA model can easily be generalized to the case of more 
than two states. The computational intensity involved increases rapidly, however. 
For more discussions of Markov switching models in econometrics, see Hamilton 
(1994, Chapter 22). 
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Example 4.5. Consider the growth rate, in percentages, of the U.S. quarterly 
real gross national product (GNP) from the second quarter of 1947 to the first 
quarter of 1991. The data are seasonally adjusted and shown in Figure 4.4, where 
a horizontal line of zero growth is also given. It is reassuring to see that a majority 
of the growth rates are positive. This series has been widely used in nonlinear 
analysis of economic time series. Tiao and Tsay (1994) and Potter (1995) use TAR 
models, whereas Hamilton (1989) and McCulloch and Tsay (1994) employ Markov 
switching models. 

Employing the MSA model in Eq. (4.18) with p = 4 and using a Markov chain 
Monte Carlo method, which is discussed in Chapter 12, McCulloch and Tsay (1994) 
obtain the estimates shown in Table 4.1. The results have several interesting find- 
ings. First, the mean growth rate of the marginal model for state 1 is 0.909/(1 — 
0.265 — 0.029 + 0.126 + 0.11) = 0.965 and that of state 2 is —0.42/(1 — 0.216 — 
0.628 + 0.073 + 0.097) = —1.288. Thus, state | corresponds to quarters with posi- 
tive growth, or expansion periods, whereas state 2 consists of quarters with negative 
growth, or a contraction period. Second, the relatively large posterior standard devi- 
ations of the parameters in state 2 reflect that there are few observations in that state. 
This is expected as Figure 4.4 shows few quarters with negative growth. Third, 
the transition probabilities appear to be different for different states. The estimates 
indicate that it is more likely for the U.S. GNP to get out of a contraction period 
than to jump into one —0.286 versus 0.118. Fourth, treating 1/w; as the expected 
duration for the process to stay in state i, we see that the expected durations for 
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Figure 4.4 Time plot of growth rate of U.S. quarterly real GNP from 1947.II to 1991.1. Data are 
seasonally adjusted and in percentages. 
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TABLE 4.1 Estimation Results of Markov Switching Model with p = 4 for Growth 
Rate of U.S. Quarterly Real GNP, Seasonally Adjusted 


Parameter Ci Qı 2 3 Q4 on Wi 
State 1 

Estimate 0.909 0.265 0.029 —0.126 —0.110 0.816 0.118 

Standard Error 0.202 0.113 0.126 0.103 0.109 0.125 0.053 
State 2 

Estimate —0.420 0.216 0.628  —0.073 —0.097 1.017 0.286 

Standard Error 0.324 0.347 0.377 0.364 0.404 0.293 0.064 


“The estimates and their standard errors are posterior means and standard errors of a Gibbs sampling 
with 5000 iterations. 


a contraction period and an expansion period are approximately 3.69 and 11.31 
quarters. Thus, on average, a contraction in the U.S. economy lasts about a year, 
whereas an expansion can last for 3 years. Finally, the estimated AR coefficients 
of x;—2 differ substantially between the two states, indicating that the dynamics of 
the U.S. economy are different between expansion and contraction periods. 


4.1.5 Nonparametric Methods 


In some financial applications, we may not have sufficient knowledge to prespecify 
the nonlinear structure between two variables Y and X. In other applications, we 
may wish to take advantage of the advances in computing facilities and compu- 
tational methods to explore the functional relationship between Y and X. These 
considerations lead to the use of nonparametric methods and techniques. Nonpara- 
metric methods, however, are not without cost. They are highly data dependent 
and can easily result in overfitting. Our goal here is to introduce some nonparamet- 
ric methods for financial applications and some nonlinear models that make use 
of nonparametric methods and techniques. The nonparametric methods discussed 
include kernel regression, local least-squares estimation, and neural network. 

The essence of nonparametric methods is smoothing. Consider two financial 
variables Y and X, which are related by 


Y,=m(X,)+q, (4.19) 


where m(-) is an arbitrary, smooth, but unknown function and {a;} is a white 
noise sequence. We wish to estimate the nonlinear function m(-) from the data. For 
simplicity, consider the problem of estimating m(-) at a particular date for which 
X =x. That is, we are interested in estimating m(x). Suppose that at X = x we 
have repeated independent observations y1, ..., yr. Then the data become 


yy = m(x) +a, fH 1, cy Ls 


Taking the average of the data, we have 


T 
ae at 


So. 
T TO 
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By the law of large numbers, the average of the shocks converges to zero as T 
increases. Therefore, the average y = O 1 )/T is a consistent estimate of m (x). 
That the average y provides a consistent estimate of m(x) or, alternatively, that the 
average of shocks converges to zero shows the power of smoothing. 

In financial time series, we do not have repeated observations available at X = x. 
What we observed are {();,x;)} for t = 1, ..., T. But if the function m(-) is 
sufficiently smooth, then the value of Y, for which X, ~ x continues to provide 
accurate approximation of m(x). The value of Y, for which X; is far away from 
x provides less accurate approximation for m(x). As a compromise, one can use a 
weighted average of y; instead of the simple average to estimate m(x). The weight 
should be larger for those Y, with X, close to x and smaller for those Y, with 
X, far away from x. Mathematically, the estimate of m(x) for a given x can be 
written as 


R Le 
n(x) = 7 2 w, (x)yr, (4.20) 


where the weights w, (x) are larger for those y, with x; close to x and smaller for 
those y, with x, far away from x. In Eq. (4.20), we assume that the weights sum 
to T. One can treat 1/T as part of the weights and make the weights sum to one. 

From Eq. (4.20), the estimate m(x) is simply a local weighted average with 
weights determined by two factors. The first factor is the distance measure (i.e., 
the distance between x; and x). The second factor is the assignment of weight for 
a given distance. Different ways to determine the distance between x, and x and to 
assign the weight using the distance give rise to different nonparametric methods. 
In what follows, we discuss the commonly used kernel regression and local linear 
regression methods. 


Kernel Regression 

Kernel regression is perhaps the most commonly used nonparametric method in 
smoothing. The weights here are determined by a kernel, which is typically a 
probability density function, is denoted by K(x), and satisfies 


K(x) > 0, fron- L. 


However, to increase the flexibility in distance measure, one often rescales the 
kernel using a variable h > 0, which is referred to as the bandwidth. The rescaled 
kernel becomes 


K(x) = I KG/H, fho dz=1. (4.21) 


The weight function can now be defined as 
K h (x = X;) 


ia ss Ly (4.22) 
ELi Kra — xe) 


w(x) = 
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Figure 4.5 Standard normal kernel (solid line) and Epanechnikov kernel (dashed line) with bandwidth 
har. 


where the denominator is a normalization constant that makes the smoother adap- 
tive to the local intensity of the X variable and ensures the weights sum to one. 
Plugging Eq. (4.22) into the smoothing formula (4.20), we have the well-known 
Nadaraya—-Watson kernel estimator 


Eia Kae — xy. 


i (4.23) 
S Kn(x — x4) 


T 
a(x) = Do wa)y = 
t=1 


see Nadaraya (1964) and Watson (1964). In practice, many choices are available 
for the kernel K(x). However, theoretical and practical considerations lead to a 
few choices, including the Gaussian kernel 


Kis) = — ( - ) 
x) = ——exp|-—> 
enn isn ON 
and the Epanechnikov kernel (Epanechnikov, 1969) 


0.75 x? x 
eS, - (: = =) a(l): 


where 7 (A) is an indicator such that 7 (A) = 1 if A holds and 7 (A) = 0 otherwise. 
Figure 4.5 shows the Gaussian and Epanechnikov kernels for h = 1. 
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To understand the role played by the bandwidth h, we evaluate the 
Nadaraya—Watson estimator with the Epanechnikov kernel at the observed values 
{x,} and consider two extremes. First, if h —> 0, then 


A Kp (0) yr 
m(x) > = yr, 


K, (0) 


indicating that small bandwidths reproduce the data. Second, if h —> oo, then 


Er Ki 1 yes 


m(x;) > — = 


y = 
DA K, (0) L t=1 


suggesting that large bandwidths lead to an oversmoothed curve—the sample mean. 
In general, the bandwidth function h acts as follows. If h is very small, then the 
weights focus on a few observations that are in the neighborhood around each xz. 
If h is very large, then the weights will spread over a larger neighborhood of xz. 
Consequently, the choice of h plays an important role in kernel regression. This is 
the well-known problem of bandwidth selection in kernel regression. 


Bandwidth Selection 

There are several approaches for bandwidth selection; see Härdle (1990) and Fan 
and Yao (2003). The first approach is the plug-in method, which is based on 
the asymptotic expansion of the mean integrated squared error (MISE) for kernel 
smoothers 


MISE = E T [m(x) — m(x)? dx, 


where m(-) is the true function. The quantity E[m(x) — m(x)]? of the MISE is a 
pointwise measure of the mean squared error (MSE) of m(x) evaluated at x. Under 
some regularity conditions, one can derive the optimal bandwidth that minimizes 
the MISE. The optimal bandwidth typically depends on several unknown quantities 
that must be estimated from the data with some preliminary smoothing. Several 
iterations are often needed to obtain a reasonable estimate of the optimal bandwidth. 
In practice, the choice of preliminary smoothing can become a problem. Fan and 
Yao (2003) give a normal reference bandwidth selector as 


î _ 1.06sT~!/> for the Gaussian kernel, 
on 2.34sT—'/5 for the Epanechnikov kernel, 


where s is the sample standard error of the independent variable, which is assumed 
to be stationary. 
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The second approach to bandwidth selection is the leave-one-out cross valida- 
tion. First, one observation (x;, yj) is left out. The remaining T — 1 data points 
are used to obtain the following smoother at x;: 


a ji 
Mp, (Xj) = Fol 5 wr (xj)Yr, 
Fj 


which is an estimate of yj, where the weights w,(x;) sum to T — 1. Second, 
perform step 1 for j = 1,..., T and define the function 


ee g 
CV(h) = = X Ly = În j PW), 
j=l 


where W(-) is a nonnegative weight function satisfying jel W(x;) =T, that 
can be used to down-weight the boundary points if necessary. Decreasing the 
weights assigned to data points close to the boundary is needed because those 
points often have fewer neighboring observations. The function CV(/) is called 
the cross-validation function because it validates the ability of the smoother to 
predict { yy ;- One chooses the bandwidth A that minimizes the CV(-) function. 


Local Linear Regression Method 

Assume that the second derivative of m(-) in model (4.19) exists and is continuous 
at x, where x is a given point in the support of m(-). Denote the data available by 
1679 xD 1- The local linear regression method to nonparametric regression is to 
find a and b that minimize 


T 
L(a, b) = X [y — a — B(x — x) P Ki (x — x), (4.24) 


t=1 


where K,(-) is a kernel function defined in Eq. (4.21) and h is a bandwidth. Denote 
the resulting value of a by a. The estimate of m(x) is then defined as a. In practice, 
x assumes an observed value of the independent variable. The estimate b can be 
used as an estimate of the first derivative of m(-) evaluated at x. 

Under the least-squares theory, Eq. (4.24) is a weighted least-squares problem 
and one can derive a closed-form solution for a. Specifically, taking the partial 
derivatives of L(a, b) with respect to both a and b and equating the derivatives to 
zero, we have a system of two equations with two unknowns: 


T T r 
XO Kine — x1) =a Y Kr x) bY a x) Kr — x), 
t=1 t=1 t=1 
sy T F 
PO yee — xt) Kine — x) = a> (x x) Kae — x) bY E x)? Kn — xi). 


t=1 t=1 t=1 
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Define 


T 
sre =) Kax—x)Q@—m)',  €=0,1,2. 


t=1 


The prior system of equations becomes 


| STO ST,1 | | a | z Eai Kn (x — x1) yt 
sti sT2 JL b ELEKE xy | 


Consequently, we have 


T T 
a ST2 erat Kae — x1) ye — 87,1 Pora Kn — 1) ye 
ST,OST,2 — Spy 


The numerator and denominator of the prior fraction can be further simplified as 


T va 
ST,2 > Kn(x — Xt) yt — ST pe: — X14) Kn(x — x1) yr 


t=1 t=1 


T 
= $ {Kr@ — x)Isr2 — (&@ — x)sra y 
t= 
i if 
SToST2 — Sp.) = >) Kae — xst — Y O — x) Kaa — 57,1 


t= t=1 


T 
= $ Ka — xpisr2 — @ — x)sr1]. 


t= 


In summary, we have 


a Wr Yr (4.25) 


where w; is defined as 
wr = K(x — x) [87,2 — (x — x) 87,1]. 


In practice, to avoid possible zero in the denominator, we use the following (x) 
to estimate m(x): 


Drei W (4.26) 


A(x) = —Ste ee 
Li w + 1/7? 
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Notice that a nice feature of Eq. (4.26) is that the weight w; satisfies 


T 
Sia — x,)w; = 0. 
t=1 


Also, if one assumes that m(-) of Eq. (4.19) has the first derivative and finds the 
minimizer of 


F 
X Or- a} Ka(x — x), 


t=1 


then the resulting estimator is the Nadaraya—Watson estimator mentioned earlier. 
In general, if one assumes that m(x) has a bounded kth derivative, then one can 
replace the linear polynomial in Eq. (4.24) by a (k — 1)-order polynomial. We refer 
to the estimator in Eq. (4.26) as the local linear regression smoother. Fan (1993) 
shows that, under some regularity conditions, the local linear regression estimator 
has some important sampling properties. The selection of bandwidth can be carried 
out via the same methods as before. 


Time Series Application 

In time series analysis, the explanatory variables are often the lagged values of 
the series. Consider the simple case of a single explanatory variable. Here model 
(4.19) becomes 


x =m) + ah, 


and the kernel regression and local linear regression method discussed before are 
directly applicable. When multiple explanatory variables exist, some modifications 
are needed to implement the nonparametric methods. For the kernel regression, one 
can use a multivariate kernel such as a multivariate normal density function with 
a prespecified covariance matrix: 


1 1 
Kp (x) = — ex (=z) ; 
; (h/ 2x)? |D|!/2 P 2h2 


where p is the number of explanatory variables and & is a prespecified positive- 
definite matrix. Alternatively, one can use the product of univariate kernel functions 
as a multivariate kernel—for example, 


P 2 
0.75 X 
nw- E (1-3) 1( 


This latter approach is simple, but it overlooks the relationship between the explana- 
tory variables. 
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Example 4.6. To illustrate the application of nonparametric methods in finance, 
consider the weekly 3-month Treasury bill secondary market rate from 1970 to 1997 
for 1461 observations. The data are obtained from the Federal Reserve Bank of St. 
Louis and are shown in Figure 4.6. This series has been used in the literature as 
an example of estimating stochastic diffusion equations using discretely observed 
data. See references in Chapter 6. Here we consider a simple model 


Yit = U(xr-1) dt + 0 (x;-1) dwr, 


where x; is the 3-month Treasury bill rate, y; = x; — x;-1, w; is a standard Brow- 
nian motion, and u(-) and o(-) are smooth functions of x;—1, and apply the local 
smoothing function lowess of R or S-Plus to obtain nonparametric estimates of 
u(-) and o(-); see Cleveland (1979). For simplicity, we use |y,| as a proxy of the 
volatility of xz. 

For the simple model considered, u(x;—1) is the conditional mean of y; given 
X;-1, that is, “(x%;-1) = EQ; |x;-1). Figure 4.7(a) shows the scatterplot of y(t) 
versus x;—1. The plot also contains the local smooth estimate of u(x;—1) obtained 
by lowess of R or S-Plus. The estimate is essentially zero. However, to better 
understand the estimate, Figure 4.7(b) shows the estimate /z(x;_;) on a finer scale. 
It is interesting to see that fi(x;_1) is positive when x;_; is small but becomes 
negative when x,_; is large. This is in agreement with the common sense that 
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Figure 4.6 Time plot of U.S. weekly 3-month Treasury bill rate in secondary market from 1970 to 
1997. 
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Figure 4.7 Estimation of conditional mean and volatility of weekly 3-month Treasury bill rate via a 
local smoothing method: (a) y; vs. x;-1, where y; = x; — x;—; and x; is interest rate; (b) estimate of 
H(xr-1); (c) [yz] vs. xz—1; and (d) estimate of o (x,—1). 
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when the interest rate is high, it is expected to come down, and when the rate is 
low, it is expected to increase. Figure 4.7(c) shows the scatterplot of |y(t)| versus 
X;~1 and the estimate of ô (x;—1) via lowess. The plot confirms that the higher the 
interest rate, the larger the volatility. Figure 4.7(d) shows the estimate ô(x;—1) on 
a finer scale. Clearly, the volatility is an increasing function of x;—ı and the slope 
seems to accelerate when x;_; is approaching 10%. This example demonstrates 
that simple nonparametric methods can be helpful in understanding the dynamic 
structure of a financial time series. 


R and S-Plus Commands Used in Example 4.6 


zil=read.table(’w-3mtbs7097.txt’ ,header=T) 
x=zZ1[4,1:1460]/100 
y=(z1[4,2:1461]-z1[4,1:1460])/100 

par (mfcol=c(2,2)) 

plot (x,y,pch='’*',xlab='’x(t-1)’,ylab=’y(t) ’) 
lines (lowess (x,y) ) 

title(main=’(a) y(t) vs x(t-1)’) 

fit=lowess (x,y) 
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plot (fits$x, fitSy,xlab='’x(t-1)’,ylab='’mu’,type='l’, 
ylim=c(-.002,.002)) 

title(main=’(b) Estimate of mu(.)’) 

plot (x,abs(y),pch=’*',xlab=’x(t-1)’,ylab=’abs(y) ’) 
lines (lowess (x,abs(y) ) ) 

title (main=' (c) abs(y) vs x(t-1)’) 

fit2=lowess(x,abs(y) ) 

plot (£it2s$x, fit2sSy,type='1’,xlab='’x(t-1)’,ylab=’sigma’, 
ylim=c(0,.01)) 

title (main=' (d) Estimate of sigma(.)’) 


V+tVVVV VV + NV 


The following nonlinear models are derived with the help of nonparametric 
methods. 


4.1.6 Functional Coefficient AR Model 


Recent advances in nonparametric techniques enable researchers to relax parametric 
constraints in proposing nonlinear models. In some cases, nonparametric methods 
are used in a preliminary study to help select a parametric nonlinear model. This is 
the approach taken by Chen and Tsay (1993a) in proposing the functional coefficient 
autoregressive (FAR) model that can be written as 


X= fi(Xr1) x1 te + Sp (Xi) Xt— p Far, (4.27) 


where X;—1 = (x;-1,..-,%;-4)/ is a vector of lagged values of x,. If necessary, 
X;—1 may also include other explanatory variables available at time t — 1. The 
functions f;(-) of Eq. (4.27) are assumed to be continuous, even twice differen- 
tiable, almost surely with respect to their arguments. Most of the nonlinear models 
discussed before are special cases of the FAR model. In application, one can use 
nonparametric methods such as kernel regression or local linear regression to esti- 
mate the functional coefficients f; (-), especially when the dimension of X;—1 is low 
(e.g., X;-1 is a scalar). Recently, Cai, Fan, and Yao (2000) applied the local linear 
regression method to estimate fj(-) and showed that substantial improvements in 
1-step-ahead forecasts can be achieved by using FAR models. 


4.1.7 Nonlinear Additive AR Model 


A major difficulty in applying nonparametric methods to nonlinear time series anal- 
ysis is the “curse of dimensionality.” Consider a general nonlinear AR(p) process 
Xt = f (Xt-1, - - - , Xp) + ar. A direct application of nonparametric methods to esti- 
mate f(-) would require p-dimensional smoothing, which is hard to do when p is 
large, especially if the number of data points is not large. A simple, yet effective 
way to overcome this difficulty is to entertain an additive model that only requires 
lower dimensional smoothing. A time series x; follows a nonlinear additive AR 
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(NAAR) model if 


Pp 
x = folt) + >> fii) Ha, (4.28) 


i=l 


where the f;(-) are continuous functions almost surely. Because each function f;(-) 
has a single argument, it can be estimated nonparametrically using one-dimensional 
smoothing techniques and hence avoids the curse of dimensionality. In application, 
an iterative estimation method that estimates f;(-) nonparametrically conditioned 
on estimates of f;(-) for all j i is used to estimate a NAAR model; see Chen 
and Tsay (1993b) for further details and examples of NAAR models. 

The additivity assumption is rather restrictive and needs to be examined carefully 
in application. Chen, Liu, and Tsay (1995) consider test statistics for checking the 
additivity assumption. 


4.1.8 Nonlinear State-Space Model 


Making using of recent advances in MCMC methods (Gelfand and Smith, 1990), 
Carlin, Polson, and Stoffer (1992) propose a Monte Carlo approach for nonlinear 
state-space modeling. The model considered is 


Si = frCSi-1) + ut, Xt = gi (St) + vr, (4.29) 


where S; is the state vector, f;(-) and g;(-) are known functions depending on some 
unknown parameters, {u;} is a sequence of iid multivariate random vectors with 
zero mean and nonnegative definite covariance matrix &,,, {v;} is a sequence of 
iid random variables with mean zero and variance a, and {u;} is independent of 
{v,}. Monte Carlo techniques are employed to handle the nonlinear evolution of the 
state transition equation because the whole conditional distribution function of S; 
given S,_; is needed for a nonlinear system. Other numerical smoothing methods 
for nonlinear time series analysis have been considered by Kitagawa (1998) and the 
references therein. MCMC methods (or computing-intensive numerical methods) 
are powerful tools for nonlinear time series analysis. Their potential has not been 
fully explored. However, the assumption of knowing /;(-) and g;(-) in model (4.29) 
may hinder practical use of the proposed method. A possible solution to overcome 
this limitation is to use nonparametric methods such as the analyses considered in 
FAR and NAAR models to specify f;(-) and g;(-) before using nonlinear state-space 
models. 


4.1.9 Neural Networks 


A popular topic in modern data analysis is neural networks, which can be classified 
as a semiparametric method. The literature on neural networks is enormous, and 
its application spreads over many scientific areas with varying degrees of success; 
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Figure 4.8 Feed-forward neural network with one hidden layer for univariate time series analysis. 


see Section 2 of Ripley (1993) for a list of applications and Section 10 for remarks 
concerning its application in finance. Cheng and Titterington (1994) provide infor- 
mation on neural networks from a statistical viewpoint. In this subsection, we focus 
solely on the feed-forward neural networks in which inputs are connected to one 
or more neurons, or nodes, in the input layer, and these nodes are connected 
forward to further layers until they reach the output layer. Figure 4.8 shows an 
example of a simple feed-forward network for univariate time series analysis with 
one hidden layer. The input layer has two nodes, and the hidden layer has three. 
The input nodes are connected forward to each and every node in the hidden layer, 
and these hidden nodes are connected to the single node in the output layer. We call 
the network a 2—3-1 feed-forward network. More complicated neural networks, 
including those with feedback connections, have been proposed in the literature, 
but the feed-forward networks are most relevant to our study. 


Feed-Forward Neural Networks 
A neural network processes information from one layer to the next by an “activation 
function.” Consider a feed-forward network with one hidden layer. The jth node 
in the hidden layer is defined as 


hj = fi Qoj + > WijXi |, (4.30) 


i—>j 


where x; is the value of the ith input node, f;(-) is an activation function typically 
taken to be the logistic function 


exp(z) 


fi = Tren 


aj; is called the bias, the summation i —> j means summing over all input nodes 
feeding to j, and w;j are the weights. For illustration, the jth node of the hidden 
layer of the 2—3-—1 feed-forward network in Figure 4.8 is 


exp(æoj + w1jxX1 + w2jx2) 


2 O eee Sa (4.31) 
1+ exp(q@oj; + wijx1 + w2;X2) 


j 
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For the output layer, the node is defined as 
o= fo | &oo + > wjoh; |, (4.32) 
j>o 
where the activation function f,(-) is either linear or a Heaviside function. If f,(-) 
is linear, then 
k 
0 = Ao + > Wjohj, 
j=l 
where k is the number of nodes in the hidden layer. By a Heaviside function, 
we mean f,(z) = 1 if z>0 and f,(z) = 0 otherwise. A neuron with a Heaviside 


function is called a threshold neuron, with 1 denoting that the neuron fires its 
message. For example, the output of the 2-3-1 network in Figure 4.8 is 


0 = Ado + Wight + Wh2 + W3oh3, 
if the activation function is linear; it is 


1 if ado + Wight + Waoh2 + W3oh3 > 0, 
0 if doo + Wight + Wroh2 + W3oh3 < 0, 


oz 


if f,(-) is a Heaviside function. 
Combining the layers, the output of a feed-forward neural network can be writ- 
ten as 


o = fo | Coo + > W joj doj + D WijXi : (4.33) 


j>o i—>j 


If one also allows for direct connections from the input layer to the output layer, 
then the network becomes 


o = fo | Qo + X dioxi + > Wjofj | &oj + > wijxi | | (4.34) 


i—>o j>o i>j 


where the first summation is summing over the input nodes. When the activation 
function of the output layer is linear, the direct connections from the input nodes 
to the output node represent a linear function between the inputs and output. Con- 
sequently, in this particular case model (4.34) is a generalization of linear models. 
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For the 2—3-—1 network in Figure 4.8, if the output activation function is linear, 
then Eq. (4.33) becomes 


3 
0 = Qo + ) Wjohj, 
j=l 


where h; is given in Eq. (4.31). The network thus has 13 parameters. If Eq. (4.34) 
is used, then the network becomes 


2 3 
o = Qo + > QioXi + > Wjohj, 
i=l j=l 


where again hj is given in Eq. (4.31). The number of parameters of the network 
increases to 15. 

We refer to the function in Eq. (4.33) or (4.34) as a semiparametric function 
because its functional form is known, but the number of nodes and their biases and 
weights are unknown. The direct connections from the input layer to the output 
layer in Eq. (4.34) mean that the network can skip the hidden layer. We refer to 
such a network as a skip-layer feed-forward network. 

Feed-forward networks are known as multilayer percetrons in the neural network 
literature. They can approximate any continuous function uniformly on compact sets 
by increasing the number of nodes in the hidden layer; see Hornik, Stinchcombe, 
and White (1989), Hornik (1993), and Chen and Chen (1995). This property of neu- 
ral networks is the universal approximation property of the multilayer percetrons. 
In short, feed-forward neural networks with a hidden layer can be seen as a way 
to parameterize a general continuous nonlinear function. 


Training and Forecasting 
Application of neural networks involves two steps. The first step is to train the 
network (i.e., to build a network, including determining the number of nodes and 
estimating their biases and weights). The second step is inference, especially fore- 
casting. The data are often divided into two nonoverlapping subsamples in the 
training stage. The first subsample is used to estimate the parameters of a given 
feed-forward neural network. The network so built is then used in the second sub- 
sample to perform forecasting and compute its forecasting accuracy. By comparing 
the forecasting performance, one selects the network that outperforms the others 
as the “best” network for making inference. This is the idea of cross validation 
widely used in statistical model selection. Other model selection methods are also 
available. 

In a time series application, let {(7;,x;)|f = 1,..., T} be the available data for 
network training, where x; denotes the vector of inputs and r; is the series of 
interest (e.g., log returns of an asset). For a given network, let o; be the output of 
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the network with input x+; see Eq. (4.34). Training a neural network amounts to 
choosing its biases and weights to minimize some fitting criterion—for example, 
the least squares 


T 
sS = xe — 0). 
t=1 


This is a nonlinear estimation problem that can be solved by several iterative meth- 
ods. To ensure the smoothness of the fitted function, some additional constraints 
can be added to the prior minimization problem. In the neural network literature, 
the back propagation (BP) learning algorithm is a popular method for network 
training. The BP method, introduced by Bryson and Ho (1969), works backward 
starting with the output layer and uses a gradient rule to modify the biases and 
weights iteratively. Appendix 2A of Ripley (1993) provides a derivation of back 
propagation. Once a feed-forward neural network is built, it can be used to compute 
forecasts in the forecasting subsample. 


Example 4.7. To illustrate applications of the neural network in finance, we 
consider the monthly log returns, in percentages and including dividends, for IBM 
stock from January 1926 to December 1999. We divide the data into two subsam- 
ples. The first subsample consisting of returns from January 1926 to December 
1997 for 864 observations is used for modeling. Using model (4.34) with three 
inputs and two nodes in the hidden layer, we obtain a 3—2-1 network for the 
series. The three inputs are 7;_1,7;~2, and r;—3 and the biases and weights are 
given next: 


P, = 3.22 — 1.81 fi (r11) — 2.28 (r11) — 0.097;_1 — 0.057;_2 — 0.12r;-3, 
(4.35) 


where r;—1 = (rt—1, 77-2, rr—3) and the two logistic functions are 


exp(—8.34 — 18.9771 + 2.17r;,—2 — 19.17r;—3) 

1+ exp(—8.34 — 18.97r,_) + 2.17r;-2 — 19.1743)’ 
exp(39.25 — 22.17r;,—1 — 17.3472 — 5.98r;—3) 

1 + exp(39.25 — 22.177, — 17.34,» — 5.98r;_3) ` 


Ai@r-i) = 


for) = 


The standard error of the residuals for the prior model is 6.56. For comparison, we 
also built an AR model for the data and obtained 


rı = 1.101 + 0.077r;—1 + a, Oa = 6.61. (4.36) 


The residual standard error is slightly greater than that of the feed-forward model 
in Eq. (4.35). 
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Forecast Comparison 

The monthly returns of IBM stock in 1998 and 1999 form the second subsample and 
are used to evaluate the out-of-sample forecasting performance of neural networks. 
As a benchmark for comparison, we use the sample mean of r; in the first subsample 
as the 1-step-ahead forecast for all the monthly returns in the second subsample. 
This corresponds to assuming that the log monthly price of IBM stock follows a 
random walk with drift. The mean squared forecast error (MSFE) of this benchmark 
model is 91.85. For the AR(1) model in Eq. (4.36), the MSFE of 1-step-ahead 
forecasts is 91.70. Thus, the AR(1) model slightly outperforms the benchmark. 
For the 3—2-—1 feed-forward network in Eq. (4.35), the MSFE is 91.74, which is 
essentially the same as that of the AR(1) model. 


Remark. The estimation of feed-forward networks is done by using the nnet 
package of S-Plus with default starting weights; see Venables and Ripley (1999) 
for more information. Our limited experience shows that the estimation results 
vary. For the IBM stock returns used in Example 4.7, the out-of-sample MSE for 
a 3—2-1 network can be as low as 89.46 and as high as 93.65. If we change the 
number of nodes in the hidden layer, the range for the MSE becomes even wider. 
The S-Plus commands used in Example 4.7 are given in Appendix B. 


Example 4.8. Nice features of the feed-forward network include its flexibility 
and wide applicability. For illustration, we use the network with a Heaviside acti- 
vation function for the output layer to forecast the direction of price movement for 
IBM stock considered in Example 4.7. Define a direction variable as 


1 ifr, >0, 
dı = : 
0 ifr, <0. 


We use eight input nodes consisting of the first four lagged values of both r; and 
d, and four nodes in the hidden layer to build an 8—4—1 feed-forward network 
for d; in the first subsample. The resulting network is then used to compute the 
1-step-ahead probability of an “upward movement” (i.e., a positive return) for the 
following month in the second subsample. Figure 4.9 shows a typical output of 
probability forecasts and the actual directions in the second subsample with the 
latter denoted by circles. A horizontal line of 0.5 is added to the plot. If we take a 
rigid approach by letting å, = 1 if the probability forecast is greater than or equal to 
0.5 and å, = 0 otherwise, then the neural network has a successful rate of 0.58. The 
success rate of the network varies substantially from one estimation to another, and 
the network uses 49 parameters. To gain more insight, we did a simulation study of 
running the 8—4—1 feed-forward network 500 times and computed the number of 
errors in predicting the upward and downward movement using the same method 
as before. The mean and median of errors over the 500 runs are 11.28 and 11, 
respectively, whereas the maximum and minimum number of errors are 18 and 4. 


NONLINEARITY TESTS 205 


Probability 


jo) 
= (0) (0) oo oO 
Lf i i T 
1998.5 1999.0 1999.5 2000.0 
Month 


Figure 4.9 One-step-ahead probability forecasts for positive monthly return for IBM stock using an 
8-4-1 feed-forward neural network. Forecasting period is from January 1998 to December 1999. 


For comparison, we also did a simulation with 500 runs using a random walk with 
drift—that is, 


F 1 iff, =1.19+¢, > 0, 


t= 


0 otherwise, 


where 1.19 is the average monthly log return for IBM stock from January 1926 to 
December 1997 and {e;} is a sequence of iid N (0, 1) random variables. The mean 
and median of the number of forecast errors become 10.53 and 11, whereas the 
maximum and minimum number of errors are 17 and 5, respectively. Figure 4.10 
shows the histograms of the number of forecast errors for the two simulations. The 
results show that the 8—4—1 feed-forward neural network does not outperform the 
simple model that assumes a random walk with drift for the monthly log price of 
IBM stock. 


4.2 NONLINEARITY TESTS 


In this section, we discuss some nonlinearity tests available in the literature that 
have decent power against the nonlinear models considered in Section 4.1. The tests 
discussed include both parametric and nonparametric statistics. The Ljung—Box 
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Figure 4.10 Histograms of number of forecasting errors for directional movements of monthly log 
returns of IBM stock. Forecasting period is from January 1998 to December 1999. 


statistics of squared residuals, the bispectral test, and the Brock, Dechert, and 
Scheinkman (BDS) test are nonparametric methods. The RESET test (Ramsey, 
1969), the F tests of Tsay (1986, 1989), and other Lagrange multiplier and like- 
lihood ratio tests depend on specific parametric functions. Because nonlinearity 
may occur in many ways, there exists no single test that dominates the others in 
detecting nonlinearity. 


4.2.1 Nonparametric Tests 


Under the null hypothesis of linearity, residuals of a properly specified linear model 
should be independent. Any violation of independence in the residuals indicates 
inadequacy of the entertained model, including the linearity assumption. This is 
the basic idea behind various nonlinearity tests. In particular, some of the nonlin- 
earity tests are designed to check for possible violation in quadratic forms of the 
underlying time series. 


Q-Statistic of Squared Residuals 
McLeod and Li (1983) apply the Ljung—Box statistics to the squared residuals of 
an ARMA(p, q) model to check for model inadequacy. The test statistic is 


Q(m) = rer42) 9 ie _ 
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where T is the sample size, m is a properly chosen number of autocorrelations 
used in the test, a, denotes the residual series, and 6; (a?) is the lag-i ACF of a 
If the entertained linear model is adequate, Q(m) is asymptotically a chi-squared 
random variable with m — p — q degrees of freedom. As mentioned in Chapter 
3, the prior Q-statistic is useful in detecting conditional heteroscedasticity of a; 
and is asymptotically equivalent to the Lagrange multiplier test statistic of Engle 
(1982) for ARCH models; see Section 3.4.3. The null hypothesis of the test is 
Ho : Bj =--: = Bm = 0, where £; is the coefficient of a, in the linear regression 


a; = Bo F Bia? Peet Brn Opp + et 


fort =m-+1,..., T. Because the statistic is computed from residuals (not directly 
from the observed returns), the number of degrees of freedom is m — p — q. 


Bispectral Test 

This test can be used to test for linearity and Gaussianity. It depends on the result 
that a properly normalized bispectrum of a linear time series is constant over all 
frequencies and that the constant is zero under normality. The bispectrum of a time 
series is the Fourier transform of its third-order moments. For a stationary time 
series x, in Eq. (4.1), the third-order moment is defined as 


cu, v) =g J VeWeruWern, (4.37) 


k=—00 


where u and v are integers, g = E (aĵ), wo = 1, and Yk =0 for k < 0. Taking 
Fourier transforms of Eq. (4.37), we have 


bs (wi, wo) = ŽITI- + wI wi) (wa), (4.38) 
where '(w) = ea Wy, exp(—iwu) with i = ./—1, and w; are frequencies. Yet 
the spectral density function of x; is given by 


oq 2 
Pw) = 5 IP’, 
T 
where w denotes the frequency. Consequently, the function 


|b3(w1, w2)|? 
b(w,, w2) = —————————— = constant for all (w1, w2). (4.39) 
p(w) p(w2) p(w + w2) 

The bispectrum test makes use of the property in Eq. (4.39). Basically, it estimates 
the function b(w,, w2) in Eq. (4.39) over a suitably chosen grid of points and 
applies a test statistic similar to Hotelling’s T? statistic to check the constancy of 
b(w,, w2). For a linear Gaussian series, E(a?) = g = Q so that the bispectrum is 
zero for all frequencies (w1, w2). For further details of the bispectral test, see Priest- 
ley (1988), Subba Rao and Gabr (1984), and Hinich (1982). Limited experience 
shows that the test has decent power when the sample size is large. 
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BDS Statistic 

Brock, Dechert, and Scheinkman (1987) propose a test statistic, commonly referred 
to as the BDS test, to detect the iid assumption of a time series. The statistic is, 
therefore, different from other test statistics discussed because the latter mainly 
focus on either the second- or third-order properties of x;. The basic idea of the 
BDS test is to make use of a “correlation integral” popular in chaotic time series 
analysis. Given a k-dimensional time series X, and observations ea 1» define 
the correlation integral as 


2 


C.(6) = li — I(X:;, X;), 4.40 
(5) nit Te j) (4.40) 


where Js(u, v) is an indicator variable that equals one if |u — v|| < ô, and zero 
otherwise, where || - || is the supnorm. The correlation integral measures the fraction 
of data pairs of {X,} that are within a distance of ô from each other. Consider 
next a time series x,. Construct k-dimensional vectors X : = A e RED 
which are called k histories. The idea of the BDS test is as follows. Treat a k 
history as a point in the k-dimensional space. If a are indeed iid random 
variables, then the k-histories {X A a ı Should show no pattern in the k-dimensional 
space. Consequently, the correlation integrals should satisfy the relation C;(5) = 
[C,(5)]*. Any departure from the prior relation suggests that x, are not iid. As 
a simple, but informative example, consider a sequence of iid random variables 
from the uniform distribution over [0, 1]. Let [a, b] be a subinterval of [0, 1] and 
consider the “2-history” (x+, x;+1), which represents a point in the two-dimensional 
space. Under the iid assumption, the expected number of 2-histories in the subspace 
[a, b] x [a, b] should equal the square of the expected number of x, in [a, b]. This 
idea can be formally examined by using sample counterparts of correlation integrals. 
Define 


2 
C(ô, T) = ——— I5(X¥, X*), Lal k 
eô, T) Teh =) a(X;, X5) 


where Te = T—€+1 and X*¥ =x; if €=1 and ak if €=k. Under the 
null hypothesis that {x,;} are iid with a nondegenerated distribution function F(-), 
Brock, Dechert, and Scheinkman (1987) show that 


Cy (6, T) > [C1 (6) }* with probability 1, as T —> co 


for any fixed k and ô. Furthermore, the statistic AT {Cx(, T) — [C1 (6, T)]*} is 
asymptotically distributed as normal with mean zero and variance: 


k-1 
of (6) =4| N* +2 > NOIQ? 4 G17 C* =r Nc 
j=l 
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where C = f[F(z +8)— F(z—4)]dF(z) and N= f[F(z +ô) -— F(z — 8)? 
dF (z). Note that Cı (6, T) is a consistent estimate of C, and N can be consistently 
estimated by 


6 
N (8, T) = T — D — 2) >. Is (xt, Xs 1s (Xs, Xu). 


t<s<u 


The BDS test statistic is then defined as 


k 
DLG, T) = VT{C:(8, T) — [C1 6, T)] J (4.41) 
ox (5, T) 

where oz(ô, T) is obtained from o;(5) when C and N are replaced by Cı (6, T) and 
N(6, T), respectively. This test statistic has a standard normal limiting distribution. 
For further discussion and examples of applying the BDS test, see Hsieh (1989) 
and Brock, Hsieh, and LeBaron (1991). In application, one should remove linear 
dependence, if any, from the data before applying the BDS test. The test may be 
sensitive to the choices of 6 and k, especially when k is large. 


4.2.2 Parametric Tests 


Turning to parametric tests, we consider the RESET test of Ramsey (1969) and its 
generalizations. We also discuss some test statistics for detecting threshold non- 
linearity. To simplify the notation, we use vectors and matrices in the discussion. 
If necessary, readers may consult Appendix A of Chapter 8 for a brief review on 
vectors and matrices. 


The RESET Test 

Ramsey (1969) proposes a specification test for linear least-squares regression anal- 
ysis. The test is referred to as a RESET test and is readily applicable to linear AR 
models. Consider the linear AR(p) model 


x, =X) o+a, (4.42) 


where X;_; = (1, %;-1,..., Xt—p)’ and ġ = (¢o, Pigs ., Pp)’. The first step of the 
RESET test is to obtain the least-squares estimate @ of Eq. (4.42) and compute 
the fit x, = X/_,@, the residual a, = x, — X;, and the sum of squared residuals 
SSRo = aa pā â?, where T is the sample size. In the second step, consider the 
linear regression 


â, = X,_,0, + M,_1@2 + v, (4.43) 


where M,_; = (87, PERE +y for some s > 1, and compute the least-squares resid- 
uals 


A A oA OA 
Ur = At — XO] = M,_\@2 
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and the sum of squared residuals SSR; = yE pet 0? of the regression. The basic 
idea of the RESET test is that if the linear AR(p) model in Eq. (4.42) is adequate, 
then a; and a2 of Eq. (4.43) should be zero. This can be tested by the usual F 
Statistic of Eq. (4.43) given by 


_ (SSRo — SSR1)/g 


= cap With g=s+p+l1, (4.44) 
SSRi/(T — p—8) j p 


which, under the linearity and normality assumption, has an F distribution with 
degrees of freedom g and T — p — g. 

Remark. Because xe for k=2,...,s+1 tend to be highly correlated 
with X,_; and among themselves, principal components of M,—ı that are not 
co-linear with X;—; are often used in fitting Eq. (4.43). Principal component 
analysis is a statistical tool for dimension reduction; see Chapter 8 for more 
information. 


Keenan (1985) proposes a nonlinearity test for time series that uses x only and 
modifies the second step of the RESET test to avoid multicollinearity between £? 
and X;—1. Specifically, the linear regression (4.43) is divided into two steps. In 


step 2(a), one removes linear dependence of £? on X;_1 by fitting the regression 
2? = X' Btu 
t —~“t-1 t 


and obtaining the residual ù, = ge — X,_,B. In step 2(b), consider the linear regres- 
sion 


a, = UQ + vr, 


and obtain the sum of squared residuals SSR; = Zipp Âr — û,â) = sae ô? 
to test the null hypothesis a = 0. 


The F Test 

To improve the power of Keenan’s test and the RESET test, Tsay (1986) uses a 
different choice of the regressor M,—1. Specifically, he suggests using M,—1 = 
vech(X,— ix) 1), where vech(A) denotes the half-stacking vector of the matrix 
A using elements on and below the diagonal only; see Appendix B of Chapter 8 
for more information about the operator. For example, if p = 2, then M,_; = 
(x?_,, X+-1X1_2, X75)’. The dimension of M,—1 is p(p + 1)/2 for an AR(p) model. 
In practice, the test is simply the usual partial F statistic for testing œ = O in the 
linear least-squares regression 


Xt = X'$ Eg M'_\a + er, 


where e; denotes the error term. Under the assumption that x; is a linear AR(p) 
process, the partial F statistic follows an F distribution with degrees of freedom 
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g and T — p — g — 1, where g = p(p + 1)/2. We refer to this F test as the Ori- 
F test. Luukkonen, Saikkonen, and Teräsvirta (1988) further extend the test by 
augmenting M,—ı with cubic terms Ia fori = 1,...,p. 


Threshold Test 
When the alternative model under study is a SETAR model, one can derive specific 
test statistics to increase the power of the test. One of the specific tests is the 
likelihood ratio statistic. This test, however, encounters the difficulty of undefined 
parameters under the null hypothesis of linearity because the threshold is undefined 
for a linear AR process. Another specific test seeks to transform testing threshold 
nonlinearity into detecting model changes. It is then interesting to discuss the 
differences between these two specific tests for threshold nonlinearity. 

To simplify the discussion, let us consider the simple case that the alterna- 
tive model is a 2-regime SETAR model with threshold variable x;-g. The null 
hypothesis Ho: x; follows the linear AR(p) model 


p 
x = Qo + >) QiXi +41, (4.45) 


i=l 


whereas the alternative hypothesis H,: x; follows the SETAR model 


ad 1 . 
$o ) T = l "3 + ait if Xt-d <T1, 


i (4.46) 
p HEL OO xi tan if xa 2r, 


Xt = 


where rı is the threshold. For a given realization {x} ,; and assuming normality, 
let lo(@, 62) be the log-likelihood function evaluated at the maximum-likelihood 
estimates of @ = (@o,..., bp) and oe, This is easy to compute. The likelihood 
function under the alternative is also easy to compute if the threshold r; is given. 
Let li (r1; i, re oo, 65) be the log-likelihood function evaluated at the maximum- 
likelihood estimates of $; = Ë besig 7 y and of conditioned on knowing the 
threshold rı. The log-likelihood ratio l (r1) defined as 


Iri) = l (ri; Q1, 67; b>, 67) — lo(@, 62) 


is then a function of the threshold r1, which is unknown. Yet under the null hypoth- 
esis, there is no threshold and rı is not defined. The parameter rı is referred to 
as a nuisance parameter under the null hypothesis. Consequently, the asymptotic 
distribution of the likelihood ratio is very different from that of the conventional 
likelihood ratio statistics. See Chan (1991) for further details and critical values of 
the test. A common approach is to use /max = sup, <r <u /(r1) as the test statistic, 
where v and u are prespecified lower and upper bounds of the threshold. Davis 
(1987) and Andrews and Ploberger (1994) provide further discussion on hypothesis 
testing involving nuisance parameters under the null hypothesis. Simulation is often 
used to obtain empirical critical values of the test statistic /max, which depends on 
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the choices of v and u. The average of /(r,) over rı € [v, u] is also considered by 
Andrews and Ploberger as a test statistic. 

Tsay (1989) makes use of arranged autoregression and recursive estimation to 
derive an alternative test for threshold nonlinearity. The arranged autoregression 
seeks to transfer the SETAR model under the alternative hypothesis H, into a model 
change problem with the threshold rı serving as the change point. To see this, the 
SETAR model in Eq. (4.46) says that x, follows essentially two linear models 
depending on whether x;—q < rı Of X;-¢g > rı. For a realization (eg: X;-q can 
assume values {x,...,x7—q}. Let xa) < x2) < +++ < x(7~a) be the ordered statis- 
tics of ‘<a (i.e., arranging the observations in increasing order). The SETAR 
model can then be written as 


P 
Xj)+d = Bot XO Bixo- +aj4a, jol,...,T =d, (4.47) 


i=l 


where f; = 4? if xg) < rı and B; = 4P if xg) = rı. Consequently, the threshold 
rı is a change point for the linear regression in Eq. (4.47), and we refer to Eq. (4.47) 
as an arranged autoregression (in increasing order of the threshold x;—a4). Note that 
the arranged autoregression in (4.47) does not alter the dynamic dependence of x; 
on xi fori = 1,..., p because x,;)4q still depends on xi)+a-i for i = 1,..., p. 
What is done is simply to present the SETAR model in the threshold space instead 
of in the time space. That is, the equation with a smaller x;_q¢ appears before that 
with a larger x;_,. The threshold test of Tsay (1989) is obtained as follows. 


e Step 1. Fit Eq. (4.47) using j = 1,...,m, where m is a prespecified positive 
integer (e.g., 30). Denote the least-squares estimates of 6; by i,m, where m 
denotes the number of data points used in estimation. 


e Step 2. Compute the predictive residual 


Pp 
Âm+1)+d = X(m+1+d — Êo,m — > Bi,mX(m+1)+d-i 
i=l 
and its standard error. Let @¢+41)4¢ be the standardized predictive residual. 


e Step 3. Use the recursive least-squares method to update the least-squares 
estimates to 8j,m+1 by incorporating the new data point Xon+1)+d- 


e Step 4. Repeat steps 2 and 3 until all data points are processed. 
e Step 5. Consider the linear regression of the standardized predictive residual 


p 
êm+j)+d = Q0 + nee + ùr, j=l,...,.T-—d—m (448) 


i=l 


and compute the usual F statistic for testing a; = 0 in Eq. (4.48) for i = 
0,..., p. Under the null hypothesis that x; follows a linear AR(p) model, 
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the F ratio has a limiting F distribution with degrees of freedom p + 1 and 
T-—d-—m-p. 


We refer to the earlier F test as a TAR-F test. The idea behind the test is that 
under the null hypothesis there is no model change in the arranged autoregression 
in Eq. (4.47) so that the standardized predictive residuals should be close to iid 
with mean zero and variance 1. In this case, they should have no correlations with 
the regressors X(m+4;)+q¢-i. For further details including formulas for a recursive 
least-squares method and some simulation study on performance of the TAR-F 
test, see Tsay (1989). The TAR-F test avoids the problem of nuisance parameters 
encountered by the likelihood ratio test. It does not require knowing the threshold 
rı. It simply tests that the predictive residuals have no correlations with regressors 
if the null hypothesis holds. Therefore, the test does not depend on knowing the 
number of regimes in the alternative model. Yet the TAR-F test is not as powerful 
as the likelihood ratio test if the true model is indeed a 2-regime SETAR model 
with a known innovational distribution. 


4.2.3 Applications 


In this subsection, we apply some of the nonlinearity tests discussed previously to 
five time series. For a real financial time series, an AR model is used to remove 
any serial correlation in the data, and the tests apply to the residual series of the 
model. The five series employed are as follows: 


1. rj;: A simulated series of iid N(O, 1) with 500 observations. 


2. ru: A simulated series of iid Student-r distribution with 6 degrees of freedom. 
The sample size is 500. 
3. a3;: The residual series of monthly log returns of CRSP equal-weighted index 


from 1926 to 1997 with 864 observations. The linear AR model used is 
(1 — 0.180B + 0.099B? — 0.105B°)r3, = 0.0086 + azr. 


4. a4: The residual series of monthly log returns of CRSP value-weighted index 
from 1926 to 1997 with 864 observations. The linear AR model used is 


(1 — 0.098B + 0.111B? — 0.088B°)ra, = 0.0078 + aay. 


5. ası: The residual series of monthly log returns of IBM stock from 1926 to 
1997 with 864 observations. The linear AR model used is 


(1 — 0.077B)rs, = 0.011 + asr. 


Table 4.2 shows the results of the nonlinearity test. For the simulated series and 
IBM returns, the F tests are based on an AR(6) model. For the index returns, the 
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TABLE 4.2 Nonlinearity Tests for Simulated Series and Some Log Stock Returns“ 


Q Q BDS(5 = 1.56,) 

Data (5) (10) 2 3 4 5 

N(0,1) 3.2 6.5 —0.32 —0.14 —0.15 —0.33 
t6 0.9 1.7 —0.87 TE: —1.56 =1:71 
In(ew) 2.9 4.9 9.94 11.72 12.83 13.65 
In(vw) 1.0 9.8 8.61 9.88 10.70 11.29 
In(ibm) 0.6 7.1 4.96 6.09 6.68 6.82 

d=1 BDS(5 = ôa) 

Data Ori-F TAR-F 2 3 4 5 

N(0,1) 1.13 0.87 —0.77 —0.71 —1.04 —1.27 
t6 0.69 0.81 —0.35 —0.76 1,95 —1.49 
In(ew) 5.05 6.77 10.01 11.85 13.14 14.45 
In(vw) 4.95 6.85 7.01 7.83 8.64 9.53 
In(ibm) 1.32 1.51 3.82 4.70 5.45 5.72 


“The sample size of simulated series is 500 and that of stock returns is 864. The BDS test uses 
RE en 


AR order is the same as the model given earlier. For the BDS test, we chose ô = 
6, and ô = 1.56, with k = 2,...,5. Also given in the table are the Ljung—Box 
statistics that confirm no serial correlation in the residual series before applying 
nonlinearity tests. Compared with their asymptotic critical values, the BDS test and 
F tests are insignificant at the 5% level for the simulated series. However, the BDS 
tests are highly significant for the real financial time series. The F tests also show 
significant results for the index returns, but they fail to suggest nonlinearity in the 
IBM log returns. In summary, the tests confirm that the simulated series are linear 
and suggest that the stock returns are nonlinear. 


4.3 MODELING 


Nonlinear time series modeling necessarily involves subjective judgment. However, 
there are some general guidelines to follow. It starts with building an adequate lin- 
ear model on which nonlinearity tests are based. For financial time series, the 
Ljung—Box statistics and Engle’s test are commonly used to detect conditional 
heteroscedasticity. For general series, other tests of Section 4.2 apply. If nonlin- 
earity is statistically significant, then one chooses a class of nonlinear models to 
entertain. The selection here may depend on the experience of the analyst and the 
substantive matter of the problem under study. For volatility models, the order 
of an ARCH process can often be determined by checking the partial autocorre- 
lation function of the squared series. For GARCH and EGARCH models, only 
lower orders such as (1,1), (1,2), and (2,1) are considered in most applications. 
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Higher order models are hard to estimate and understand. For TAR models, one 
may use the procedures given in Tong (1990) and Tsay (1989, 1998) to build an 
adequate model. When the sample size is sufficiently large, one may apply non- 
parametric techniques to explore the nonlinear feature of the data and choose a 
proper nonlinear model accordingly; see Chen and Tsay (1993a) and Cai, Fan, and 
Yao (2000). The MARS procedure of Lewis and Stevens (1991) can also be used 
to explore the dynamic structure of the data. Finally, information criteria such as 
the Akaike information criterion (Akaike, 1974) and the generalized odd ratios in 
Chen, McCulloch, and Tsay (1997) can be used to discriminate between competing 
nonlinear models. The chosen model should be carefully checked before it is used 
for prediction. 


4.4 FORECASTING 


Unlike the linear model, there exist no closed-form formulas to compute forecasts 
of most nonlinear models when the forecast horizon is greater than 1. We use 
parametric bootstraps to compute nonlinear forecasts. It is understood that the model 
used in forecasting has been rigorously checked and is judged to be adequate for the 
series under study. By a model, we mean the dynamic structure and innovational 
distributions. In some cases, we may treat the estimated parameters as given. 


4.4.1 Parametric Bootstrap 


Let T be the forecast origin and £ be the forecast horizon (€>0). That is, we 
are at time index T and interested in forecasting xr+e. The parametric bootstrap 
considered computes realizations x7+1,..., Xr+¢ sequentially by (a) drawing a 
new innovation from the specified innovational distribution of the model, and (b) 
computing x74; using the model, data, and previous forecasts x741,...,X7+i-1- 
This results in a realization for x74. The procedure is repeated M times to obtain 


M realizations of xr+e denoted by eel 1- The point forecast of xr+e is then 


the sample average of T Let the forecast be xr(£). We used M = 3000 in 
some applications and the results seem fine. The realizations Go ,; can also 
be used to obtain an empirical distribution of x74¢. We make use of this empirical 
distribution later to evaluate forecasting performance. 


4.4.2 Forecasting Evaluation 


There are many ways to evaluate the forecasting performance of a model, ranging 
from directional measures to magnitude measures to distributional measures. A 
directional measure considers the future direction (up or down) implied by the 
model. Predicting that tomorrow’s S&P 500 index will go up or down is an example 
of directional forecasts that are of practical interest. Predicting the year-end value 
of the daily S&P 500 index belongs to the case of magnitude measure. Finally, 
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assessing the likelihood that the daily S&P 500 index will go up 10% or more 
between now and the year end requires knowing the future conditional probability 
distribution of the index. Evaluating the accuracy of such an assessment needs a 
distributional measure. 

In practice, the available data set is divided into two subsamples. The first sub- 
sample of the data is used to build a nonlinear model, and the second subsample 
is used to evaluate the forecasting performance of the model. We refer to the 
two subsamples of data as estimation and forecasting subsamples. In some stud- 
ies, a rolling forecasting procedure is used in which a new data point is moved 
from the forecasting subsample into the estimation subsample as the forecast origin 
advances. In what follows, we briefly discuss some measures of forecasting perfor- 
mance that are commonly used in the literature. Keep in mind, however, that there 
exists no widely accepted single measure to compare models. A utility function 
based on the objective of the forecast might be needed to better understand the 
comparison. 


Directional Measure 

A typical measure here is to use a 2 x 2 contingency table that summarizes the 
number of “hits” and “misses” of the model in predicting ups and downs of xr+e 
in the forecasting subsample. Specifically, the contingency table is given as 


Actual Predicted 


where m is the total number of ¢-step-ahead forecasts in the forecasting subsample, 
mıı is the number of “hits” in predicting upward movements, m2 is the number 
of “misses” in predicting downward movements of the market, and so on. Larger 
values in mı; and mz indicate better forecasts. The test statistic 


2 


2 (m miom /m}? 
2 ij — Momo; 
r=) am 
i=l 


p miomoj/m 


can then be used to evaluate the performance of the model. A large x? signifies that 
the model outperforms the chance of random choice. Under some mild conditions, 
x? has an asymptotic chi-squared distribution with 1 degree of freedom. For further 
discussion of this measure, see Dahl and Hylleberg (1999). 

For illustration of the directional measure, consider the 1-step-ahead probability 
forecasts of the 8—4—1 feed-forward neural network shown in Figure 4.9. The 
2 x 2 table of “hits” and “misses” of the network is 
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Actual Predicted 


Up 14 
Down 10 
24 


The table shows that the network predicts the upward movement well, but fares 
poorly in forecasting the downward movement of the stock. The chi-squared statis- 
tic of the table is 0.137 with a p value of 0.71. Consequently, the network does 
not significantly outperform a random-walk model with equal probabilities for 
“upward” and “downward” movements. 


Magnitude Measure 

Three statistics are commonly used to measure performance of point forecasts. They 
are the mean squared error (MSE), mean absolute deviation (MAD), and mean 
absolute percentage error (MAPE). For ¢-step-ahead forecasts, these measures are 
defined as 


m—1 


MSE(£) = a D i = XT+j OP, (4.49) 
J= 
m—1 
MAD) = z > lxr+epj — Xr+; OI, (4.50) 
j=0 
m-—1 
1 (£ 
MAPE(é) = — ) | ari _ | |, (4.51) 
m j=0 XT+j+e 


where m is the number of ¢-step-ahead forecasts available in the forecasting 
subsample. In application, one often chooses one of the above three measures, and 
the model with the smallest magnitude on that measure is regarded as the best £- 
step-ahead forecasting model. It is possible that different £ may result in selecting 
different models. The measures also have other limitations in model comparison; 
see, for instance, Clements and Hendry (1993). 


Distributional Measure 

Practitioners recently began to assess forecasting performance of a model using 
its predictive distributions. Strictly speaking, a predictive distribution incorporates 
parameter uncertainty in forecasts. We call it conditional predictive distribution if 
the parameters are treated as fixed. The empirical distribution of x7+¢ obtained 
by the parametric bootstrap is a conditional predictive distribution. This empirical 
distribution is often used to compute a distributional measure. Let u7(€) be the 
percentile of the observed x7 ¢ in the prior empirical distribution. We then have 
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a set of m percentiles {ur+j yay where again m is the number of ¢-step- 
ahead forecasts in the forecasting subsample. If the model entertained is adequate, 
{ur+;(€)} should be a random sample from the uniform distribution on [0, 1]. 
For a sufficiently large m, one can compute the Kolmogorov—Smirnov statistic of 
{ur+;(€)} with respect to uniform [0, 1]. The statistic can be used for both model 
checking and forecasting comparison. 


4.5 APPLICATION 


In this section, we illustrate nonlinear time series models by analyzing the quarterly 
U.S. civilian unemployment rate, seasonally adjusted, from 1948 to 1993. This 
series was analyzed in detail by Montgomery et al. (1998). We repeat some of 
the analyses here using nonlinear models. Figure 4.11 shows the time plot of 
the data. Well-known characteristics of the series include that (a) it tends to move 
countercyclically with U.S. business cycles, and (b) the rate rises quickly but decays 
slowly. The latter characteristic suggests that the dynamic structure of the series is 
nonlinear. 

Denote the series by x; and let Ax; = x; — x;-; be the change in unemployment 
rate. The linear model 


(1 — 0.31B*)(1 — 0.65B) Ax, = (1 — 0.78B*)a,, 62 = 0.090 (4.52) 


8 
fi 
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Figure 4.11 Time plot of U.S. quarterly unemployment rate, seasonally adjusted, from 1948 to 1993. 
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was built by Montgomery et al. (1998), where the standard errors of the three 
coefficients are 0.11, 0.06, and 0.07, respectively. This is a seasonal model even 
though the data were seasonally adjusted. It indicates that the seasonal adjustment 
procedure used did not successfully remove the seasonality. This model is used as 
a benchmark model for forecasting comparison. 

To test for nonlinearity, we apply some of the nonlinearity tests of Section 4.2 
with an AR(5) model for the differenced series Ax,. The results are given 
in Table 4.3. All of the tests reject the linearity assumption. In fact, the 
linearity assumption is rejected for all AR(p) models we applied, where p = 2, 
saeg 10. 

Using a modeling procedure similar to that of Tsay (1989), Montgomery et al. 
(1998) build the following TAR model for the Ax; series: 


_ 0.01 + 0.73Ax,-1 + O.10Ax,2 + ay, if Ax; < 0.1, 


= . (4.53) 
0.18 + 0.80Ax,_; — 0.56Ax;—2 + dz, otherwise. 


Xt 


The sample variances of a}; and az; are 0.76 and 0.165, respectively, the standard 
errors of the three coefficients of regime 1 are 0.03, 0.10, and 0.12, respectively, 
and those of regime 2 are 0.09, 0.1, and 0.16. This model says that the change in the 
U.S. quarterly unemployment rate, Ax;, behaves like a piecewise linear model in 
the reference space of x;-2 — x;—-3 with threshold 0.1. Intuitively, the model implies 
that the dynamics of unemployment act differently depending on the recent change 
in the unemployment rate. In the first regime, the unemployment rate has had either 
a decrease or a minor increase. Here the economy should be stable, and essentially 
the change in the rate follows a simple AR(1) model because the lag-2 coefficient is 
insignificant. In the second regime, there is a substantial jump in the unemployment 
rate (0.1 or larger). This typically corresponds to the contraction phase in the 
business cycle. It is also the period during which government interventions and 
industrial restructuring are likely to occur. Here Ax; follows an AR(2) model with a 
positive constant, indicating an upward trend in x,. The AR(2) polynomial contains 
two complex characteristic roots, which indicate possible cyclical behavior in Ax;. 
Consequently, the chance of having a turning point in x, increases, suggesting 
that the period of large increases in x; should be short. This implies that the 
contraction phases in the U.S. economy tend to be shorter than the expansion 
phases. 


TABLE 4.3 Nonlinearity Test for Changes in the U.S. Quarterly Unemployment 
Rate: 1948.11-1993.1V% 


Type Ori-F LST TAR(1) TAR(2) TAR(3) TAR(4) 
Test 2.80 2.83 2.41 2.16 2.84 2.98 
p Value .0007 .0002 0298 .0500 0121 .0088 


“An AR(5) model was used in the tests, where LST denotes the test of Luukkonen et al. (1988) and 
TAR(d) means threshold test with delay d. 
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Applying a Markov chain Monte Carlo method, Montgomery et al. (1998) obtain 
the following Markov switching model for Ax;: 


—0.07 + 0.38Ax;-1 — 0.05Ax;-2 + € ifs; = 1, 


= i (4.54) 
0.16 + 0.86Ax;—1 — 0.38Ax,;_2 + Ex if s, = 2. 


Xt 


The conditional means of Ax; are —0.10 for s; = 1 and 0.31 for s, = 2. Thus, the 
first state represents the expansionary periods in the economy, and the second state 
represents the contractions. The sample variances of €; and €2; are 0.031 and 0.192, 
respectively. The standard errors of the three parameters in state s; = 1 are 0.03, 
0.14, and 0.11, and those of state s; = 2 are 0.04, 0.13, and 0.14, respectively. The 
state transition probabilities are P(s; = 2|s;-; = 1) = 0.084(0.060) and P(s; = 
1|s;-1 = 2) = 0.126(0.053), where the number in parentheses is the corresponding 
standard error. This model implies that in the second state the unemployment rate x; 
has an upward trend with an AR(2) polynomial possessing complex characteristic 
roots. This feature of the model is similar to the second regime of the TAR model 
in Eq. (4.53). In the first state, the unemployment rate x; has a slightly decreasing 
trend with a much weaker autoregressive structure. 


Forecasting Performance 
A rolling procedure was used by Montgomery et al. (1998) to forecast the unem- 
ployment rate x,. The procedure works as follows: 


1. Begin with forecast origin T = 83, corresponding to 1968.II, which was used 
in the literature to monitor the performance of various econometric models in 
forecasting unemployment rate. Estimate the linear, TAR, and MSA models 
using the data from 1948.1 to the forecast origin (inclusive). 

2. Perform |-quarter to 5-quarter ahead forecasts and compute the forecast errors 
of each model. Forecasts of nonlinear models used are computed by using 
the parametric bootstrap method of Section 4.4. 

3. Advance the forecast origin by | and repeat the estimation and forecasting 
processes until all data are employed. 


4. Use MSE and mean forecast error to compare performance of the models. 


Table 4.4 shows the relative MSE of forecasts and mean forecast errors for the 
linear model in Eq. (4.52), the TAR model in Eq. (4.53), and the MSA model in 
Eq. (4.54), using the linear model as a benchmark. The comparisons are based on 
overall performance as well as the status of the U.S. economy at the forecast origin. 
From the table, we make the following observations: 


1. For the overall comparison, the TAR model and the linear model are very 
close in MSE, but the TAR model has smaller biases. Yet the MSA model 
has the highest MSE and smallest biases. 
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TABLE 4.4 Out-of-Sample Forecast Comparison among Linear, TAR, and MSA 
Models for the U.S. Quarterly Unemployment Rate“ 


Relative MSE of Forecast 
Model 1-step 2-step 3-step 4-step 5-step 


Overall Comparison 


Linear 1.00 1.00 1.00 1.00 1.00 
TAR 1.00 1.04 0.99 0.98 1.03 
MSA 1.19 1.39 1.40 1.45 1.61 
MSE 0.08 0.31 0.67 1.13 1.54 


Forecast Origins in Economic Contractions 


Linear 1.00 1.00 1.00 1.00 1.00 
TAR 0.85 0.91 0.83 0.72 0.72 
MSA 0.97 1.03 0.96 0.86 1.02 
MSE 0.22 0.97 2.14 3.38 3.46 


Forecast Origins in Economic Expansions 


Linear 1.00 1.00 1.00 1.00 1.00 
TAR 1.06 113 1.10 1.15 1.17 
MSA 1.31 1.64 1.73 1.84 1.87 
MSE 0.06 0.21 0.45 0.78 1.24 


Mean of Forecast Errors 


Model l-step 2-step 3-step 4-step 5-step 


Overall Comparison 


Linear 0.03 0.09 0.17 0.25 0.33 
TAR —0.10 —0.02 —0.03 —0.03 —0.01 
MSA 0.00 —0.02 —0.04 —0.07 —0.12 


Forecast Origins in Economic Contractions 


Linear 0.31 0.68 1.08 1.41 1.38 
TAR 0.24 0.56 0.87 1.01 0.86 
MSA 0.20 0.41 0.57 0.52 0.14 


Forecast Origins in Economic Expansions 


Linear —0.01 0.00 0.03 0.08 0.17 
TAR —0.05 —0.11 —0.17 —0.19 —0.14 
MSA —0.03 —0.08 —0.13 =0:17 —0.16 


“The starting forecast origin is 1968.II, where the row marked by MSE shows the MSE of the benchmark 
linear model. 
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2. For forecast origins in economic contractions, the TAR model shows 
improvements over the linear model both in MSE and bias. The MSA model 
also shows some improvement over the linear model, but the improvement 
is not as large as that of the TAR model. 


3. For forecast origins in economic expansions, the linear model outperforms 
both nonlinear models. 


The results suggest that the contributions of nonlinear models over linear ones in 
forecasting the U.S. quarterly unemployment rate are mainly in the periods when 
the U.S. economy is in contraction. This is not surprising because, as mentioned 
before, it is during the economic contractions that government interventions and 
industrial restructuring are most likely to occur. These external events could intro- 
duce nonlinearity in the U.S. unemployment rate. Intuitively, such improvements 
are important because it is during the contractions that people pay more attention 
to economic forecasts. 


APPENDIX A: SOME RATS PROGRAMS FOR NONLINEAR 
VOLATILITY MODELS 


Program Used to Estimate an AR(2)—TAR-GARCH (1,1) Model for Daily Log 
Returns of IBM Stock 
Assume that the data file is d-ibm1n03.txt. 


all 0 10446:1 
open data d-ibmln03.txt 
data(org=obs) / rt 
set h = 0.0 
nonlin mu p2 a0 al bl a2 b2 
frml at = rt(t)-mu-p2*rt(t-2) 
frml gvar = a0 + al*at(t-1)**2+b1*h(t-1) $ 
+ % if(at(t-1) < 0,a2*at(t-1)**2+b2*h(t-1) ,0) 


frml garchin = -0.5*log(h(t)=gvar(t))-0.5*at(t)**2/h(t) 
smpl 4 10446 
compute mu = 0.03, p2 = -0.03 


compute at = 0.07, al = 0.05, a2 = 0.05, bl = 0.85, b2 = 0.05 
maximize (method=simplex,iterations=10) garchlin 

smpl 4 10446 

maximize (method=bhhh, recursive,iterations=150) garchin 

set fv = gvar(t) 

set resid = at(t)/sqrt(fv(t)) 

set residsgq = resid(t) *resid(t) 

cor (qstats,number=20,span=10) resid 

cor (qstats,number=20,span=10) residsq 
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Program Used to Estimate a Smooth TAR Model for the Monthly Simple 
Returns of 3M Stock 
The data file is m-3m4608.txt. 


all 0 755:1 

open data m-3m4608.txt 

data(org=obs) / date mmm 

set h = 0.0 

nonlin a0 al a2 a00 all mu 

frml at = mmm(t) - mu 

frml varli = a0Otal*at(t-1)**2+a2*at(t-2)**2 

frml var2 a00+a11*at (t-1)**2 

frml gvar = var1(t)+var2(t)/(1.0+exp(-at(t-1)*1000.0)) 


frml garchlog = -0.5*log(h(t)=gvar(t))-0.5*at(t) **2/h(t) 
smpl 3 623 

compute a0 = .01, al = 0.2, a2 = 0.1 

compute a00 = .01, all = -.2, mu = 0.02 


maximize (method=bhhh, recursive, iterations=150) garchlog 
set fv = gvar(t) 
set resid = at(t)/sqrt(fv(t) 


) 
set residsg = resid(t) *resid(t) 
cor (qstats,number=20,span=10) resid 
cor (qstats,number=20,span=10) residsq 


APPENDIX B: R AND S-PLUS COMMANDS FOR NEURAL NETWORK 


The following commands are used in R or S-Plus to build the 3—2-1 skip-layer 
feed-forward network of Example 4.7. A line starting with # denotes a comment. 
The data file is m-ibmin.txt. The library used is nnet. 


# load the data into R or S-Plus workspace. 
x_scan(file=’m-ibmln.txt’) 

select the output: r(t) 

y_x[4:864] 

# obtain the input variables: r(t-1), r(t-2), and r(t-3) 
ibm.x_cbind(x[3:863]_,x[2:862],x[1:861]) 

# build a 3-2-1 network with skip layer connections 
and linear output. 

ibm.nn_nnet (ibm.x,y,size=2, linout=T, skip=T,maxit=10000, 
decay=le-2,reltol=le-7,abstol=le-7,range=1.0) 

# print the summary results of the network 

summary (ibm.nn) 

# compute \& print the residual sum of squares. 
sse_sum((y-predict (ibm.nn, ibm.x) )*2) 

print (sse) 
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eigen(nnet.Hess(ibm.nn,ibm.x,y),T)$values 
setup the input variables in the forecasting subsample 
ibm.p_cbind(x[864:887],x[863:886],x[862:885]) 
compute the forecasts 
yh_predict (ibm.nn,ibm.p) 
The observed returns in the forecasting subsample 
yo_x[865:888] 
compute \& print the sum of squares of forecast errors 
ssfe_sum((yo-yh) *2) 
print (ssfe) 
quit S-Plus or R 


EXERCISES 


4.1. 


4.2. 


4.3. 


Consider the daily simple returns of Johnson & Johnson stock from January 
1998 to December 2008. The data are in the file d-jnj9808.txt or can be 
obtained from CRSP. Convert the returns into log returns in percentage. (a) 
Build a GJR model for the log return series. Write down the fitted model. Is 
the leverage effect significant at the 1% level? (b) Build a general threshold 
volatility model for the log return series. (c) Compare the two TGARCH 
models. 

Consider the monthly simple returns of General Electric (GE) stock from 
January 1926 to December 2008 with 996 observations. You may download 
the data from CRSP or use the file m-ge2608.txt on the Web. Convert 
the returns into log returns in percentages. Build a TGARCH model with 
GED innovations for the series using a;_; as the threshold variable with zero 
threshold, where a;_; is the shock at time t — 1. Write down the fitted model. 
Is the leverage effect significant at the 5% level? 

Suppose that the monthly log returns of GE stock, measured in percentages, 
follow a smooth threshold IGARCH(1,1!) model. For the sampling period from 
January 1926 to December 2008, the fitted model is 


r, = 1.14 + a, ay = OtEt 
1 


2 — 0.119a? 0.88102 — 
r ait aii t 1 + exp(—10a;_1) 


(4.276 — 0.08407), 


where all of the estimates are highly significant, the coefficient 10 in the 
exponent is fixed a priori to simplify the estimation, and {e,;} are iid N (0, 1). 
Assume that do9g = —5.06 and os = 50.5. What is the 1-step-ahead volatility 
forecast Goog(1)? Suppose instead that ao9g = 5.06. What is the 1-step-ahead 
volatility forecast Go96 (1)? 
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4.4. 


4.5. 


4.6. 


Suppose that the monthly log returns, in percentages, of a stock follow the 
following Markov switching model: 


rt = 1.25 + a4, ay = Ott, 
ge 0.10a?_, + 0.9307, ifs =Í, 


4.24+0.10a?_,+0.7807., ifs,=2, 
where the transition probabilities are 
P (s; = 2|5,-1 = 1) = 0.15, P (s; = 1|s;-1 = 2) = 0.05. 


Suppose that ajo9 = 6.0, Gioi = 50.0, and s100 = 2 with probability 1.0. What 
is the 1-step-ahead volatility forecast at the forecast origin £ = 100? Also, if 
the probability of s100 = 2 is reduced to 0.8, what is the 1-step-ahead volatility 
forecast at the forecast origin t = 100? 

Consider the monthly simple returns of GE stock from January 1926 to Decem- 
ber 2008. Use the last three years of data for forecasting evaluation. 


(a) Using lagged returns r;—1, 7;~2, 7;-3 as input, build a 3-2-1 feed-forward 
network to forecast 1-step-ahead returns. Calculate the mean squared error 
of forecasts. 

(b) Again, use lagged returns r;—1,7;~2, 7-3 and their signs (directions) to 
build a 6-5-1 feed-forward network to forecast the 1-step ahead direction 
of GE stock price movement with 1 denoting upward movement. Calculate 
the mean squared error of forecasts. 

Note: Let rtn denote a time series in R or S-Plus. To create a direction 
variable for rtn, use the command 


drtn = ifelse(rtn > 0, 1, 0) 


Because of the existence of inverted yield curves in the term structure of 
interest rates, the spread of interest rates should be nonlinear. To verify this, 
consider the weekly U.S. interest rates of (a) Treasury 1-year constant maturity 
rate and (b) Treasury 3-year constant maturity rate. As in Chapter 2, denote 
the two interest rates by rj; and r3;, respectively, and the data span is from 
January 5, 1962, to April 10, 2009. The data are in files w-gs3yr.txt and 
w-gslyr.txt on the Web and can be obtained from the Federal Reserve 
Bank of St. Louis. 


(a) Let s; = r3, — rır be the spread in log interest rates. Is {s,} linear? Perform 
some nonlinearity tests and draw the conclusion using the 5% significance 
level. 
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(b) Let să = (r3¢ — 734-1) — (rit — 1142-1) = St — S;—-1 be the change in inter- 
est rate spread. Is {s;*} linear? Perform some nonlinearity tests and draw 
the conclusion using the 5% significance level. 


(c) Build a threshold model for the s; series and check the fitted model. 
(d) Build a threshold model for the s* series and check the fitted model. 
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CHAPTER 5 


High-Frequency Data Analysis 
and Market Microstructure 


High-frequency data are observations taken at fine time intervals. In finance, they 
often mean observations taken daily or at a finer time scale. These data have 
become available primarily due to advances in data acquisition and processing 
techniques, and they have attracted much attention because they are important in 
empirical study of market microstructure and realized volatility. The ultimate high- 
frequency data in finance are the transaction-by-transaction or trade-by-trade data in 
security markets. Here time is often measured in seconds. The Trades and Quotes 
(TAQ) database of the New York Stock Exchange (NYSE) contains all equity 
transactions reported on the Consolidated Tape from 1992 to the present, which 
includes transactions on the NYSE, AMEX, NASDAQ, and the regional exchanges. 
The Berkeley Options Data Base provides similar data for options transactions 
from August 1976 to December 1996. More high-frequency options data are also 
available; see the website of Chicago Board Options Exchange. Transactions data 
for many other securities and markets, both domestic and foreign, are continuously 
collected and processed. Wood (2000) provides some historical perspective of high- 
frequency financial study. 

High-frequency financial data are important in studying a variety of issues related 
to the trading process and market microstructure. They can be used to compare the 
efficiency of different trading systems in price discovery (e.g., the open out-cry 
system of the NYSE and the computer trading system of NASDAQ). They can 
also be used to study the dynamics of bid-and-ask quotes of a particular stock (e.g., 
Hasbrouck, 1999; Zhang, Russell, and Tsay, 2008). In an order-driven stock market 
(e.g., the Taiwan Stock Exchange), high-frequency data can be used to study the 
order dynamics and, more interesting, to investigate the question of “who provides 
the market liquidity.” Cho, Russell, Tiao, and Tsay (2003) use intraday 5-minute 
returns of more than 340 stocks traded on the Taiwan Stock Exchange to study the 
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impact of daily stock price limits and find significant evidence of magnet effects 
toward the price ceiling. 

However, high-frequency data have some unique characteristics that do not 
appear in lower frequencies. Analysis of these data thus introduces new challenges 
to financial economists and statisticians. In this chapter, we study these special 
characteristics, consider methods for analyzing high-frequency data, and discuss 
implications of the results obtained. In particular, we discuss nonsynchronous 
trading, bid—ask spread, duration models, price movements, and bivariate mod- 
els for price changes and time durations between transactions associated with price 
changes. The models discussed are also applicable to other scientific areas such as 
telecommunications and environmental studies. 


5.1 NONSYNCHRONOUS TRADING 


We begin with nonsynchronous trading. Stock tradings such as those on the NYSE 
do not occur in a synchronous manner; different stocks have different trading 
frequencies, and even for a single stock the trading intensity varies from hour to 
hour and from day to day. Yet we often analyze a return series in a fixed time 
interval such as daily, weekly, or monthly. For daily series, the price of a stock is 
its closing price, which is the last transaction price of the stock in a trading day. The 
actual time of the last transaction of the stock varies from day to day. As such we 
incorrectly assume daily returns as an equally spaced time series with a 24-hour 
interval. It turns out that such an assumption can lead to erroneous conclusions 
about the predictability of stock returns even if the true return series are serially 
independent. 

For daily stock returns, nonsynchronous trading can introduce (a) lag-1 cross 
correlation between stock returns, (b) lag-1 serial correlation in a portfolio return, 
and (c) in some situations negative serial correlations of the return series of a single 
stock. Consider stocks A and B. Assume that the two stocks are independent, and 
stock A is traded more frequently than stock B. For special news affecting the 
market that arrives near the closing hour on one day, stock A is more likely than 
B to show the effect of the news on the same day simply because A is traded 
more frequently. The effect of the news on B will eventually appear, but it may be 
delayed until the following trading day. If this situation indeed happens, return of 
stock A appears to lead that of stock B. Consequently, the return series may show 
a significant lag-1 cross correlation from A to B even though the two stocks are 
independent. For a portfolio that holds stocks A and B, the prior cross correlation 
would become a significant lag-1 serial correlation. 

In a more complicated manner, nonsynchronous trading can also induce erro- 
neous negative serial correlations for a single stock. There are several models 
available in the literature to study this phenomenon; see Campbell, Lo, and MacKin- 
lay (1997) and the references therein. Here we adopt a simplified version of the 
model proposed in Lo and MacKinlay (1990). Let r, be the continuously com- 
pounded return of a security at the time index t. For simplicity, assume that {r;} 
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is a sequence of independent and identically distributed random variables with 
mean E(r;) = u and variance Var(r;) = o7. For each time period, the probabil- 
ity that the security is not traded is x, which is time invariant and independent 
of r;. Let r? be the observed return. When there is no trade at time index t, we 
have r? = 0 because there is no information available. Yet when there is a trade at 
time index f, we define r? as the cumulative return from the previous trade (1.e., 
r? = ri + r1 +e +r, where k, is the largest nonnegative integer such that 
no trade occurred in the periods t — k;,t —k; + 1,...,¢— 1). Mathematically, the 
relationship between r; and r/ is 


0 with probability z 
r; with probability (1 — 7)? 
ri +r with probability (1 — 7 )?7 
r? = ri +r- +7;-2 with probability (1 — nyn? (5.1) 
oye with probability (1 — r)2x* 


These probabilities are easy to understand. For example, r? = r; if and only if there 
are trades at both ¢ and t — 1, r? =r; + r+-1 if and only if there are trades at t 
and ¢ — 2, but no trade at t — 1, and r? =r; + r;-1 + r:-2 if and only if there are 
trades at ¢ and ¢ — 3, but no trades at t — 1 and t — 2, and so on. As expected, the 
total probability is 1 given by 


1 
l-r 


n+- H+ -=n Or) =r+l-r=l1. 


We are ready to consider the moment equations of the observed return series 
{r?}. First, the expectation of r? is 
ECP) = A= YET) + (rP TET +r) 
= (1-a) p+ (lay rut (lny r a 
= (1— r) ul + 2r +3r? +407 +---) 


=j. (5.2) 


=u, 
(= x)? 


In the prior derivation, we use the result 1 + 27 + 37? + 4r? +---=1/(1—2)*. 
Next, for the variance of r?, we use Var(r?) = E{(r?)7] — [Ero] and 


Er’)? = (1 — nY ELY] + A- 2)? r Er + r] 
= (1—2)[(o7 + u’) +a? + 4?) tno? H) (5.3) 
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= (l—m)'[o*(1 + 20 +307 +--+ w1t4a 4907 +---)] 6.4) 


=o +4 é -1]. (5.5) 
In Eq. (5.3), we use 
k 2 k k 2 
E (>: nm) = Var (>: rm) + E (£) = (k+ Do? + [k + Du? 
i=0 i=0 i=0 


under the serial independence assumption of r,. Using techniques similar to that 
of Eq. (5.2), we can show that the first term of Eq. (5.4) reduces to o*. For the 
second term of Eq. (5.4), we use the identity 


2 1 
1+ 4r +9r? ++ 16r? +- = ——— — ——_, 
+ 4r + 90° + 160° + a-a Ua) 


which can be obtained as follows. Let 
H =1+4r +9r? +167? +- and G= 1+3 +5r? +I +--.. 
Then (1 —z)H = G and 


(= x)G=1+2r +27r? +27? +- 


=2(1 +r +r? +- )-1= -1. 
l-az 
Consequently, from Eqs. (5.2) and (5.5), we have 
2 Qn u’? 
Var(r?) = 0? + p? (= = i) zp ng p ZE, (5.6) 
l-r l-r 


Consider next the lag-1 autocovariance of {r?}. Here we use Cov(r?, rf) = 
E(rfr?)- E (r®)E Cpa) = Erria) u’. The question then reduces to finding 
E(r?r?_,). Notice that r?r?_, is zero if there is no trade at rf, no trade at t — 1, or 
no trade at both t and t — 1. Therefore, we have 


0 with probability 27 — z? 
reVt-1 with probability (1 — a)? 
re (4-1 + 1-2) with probability (1 — 2)+7 
rr? jr (r1 + 7-2 + 77-3) with probability (1 — TYT? (5.7) 


ZODA 1 fti) with probability (1 — 2)3z*~! 
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Again the total probability is unity. To understand the prior result, notice that 
r?re_, =r;r;—1 if and only if there are three consecutive trades at t — 2, t — 1, and 
t. Using Eq. (5.7) and the fact that E(r;r;~;) = Er) Ej) = u? for j >0, we 
have 


E(r?r?_4) 


3 
= (1 = x)? | Efri) +r E[r (1 + 2) + 27 E Ç e r=) +e | 
i=l 
= (1m u (l 420 +307 4+---)=(1—a)p’. 
The lag-1 autocovariance of {r?} is then 
Cov(r?, r?) = —mp?. (5.8) 


Provided that jz is not zero, the nonsynchronous trading induces a negative lag-1 
autocorrelation in r? given by 


(2) = dame 
rp) = —————.. 
Pe! Amol + Ine 

In general, we can extend the prior result and show that 


Cov(r/, rj) = —p'r!, j=l. 


The magnitude of the lag-1 ACF depends on the choices of u, x, and ø and can 
be substantial. Thus, when u Æ 0, the nonsynchronous trading induces negative 
autocorrelations in an observed security return series. 

The previous discussion can be generalized to the return series of a portfolio 
that consists of N securities; see Campbell et al. (1997, Chapter 3). In the time 
series literature, effects of nonsynchronous trading on the return of a single security 
are equivalent to that of random temporal aggregation on a time series, with the 
trading probability 2 governing the mechanism of aggregation. 


5.2 BID-ASK SPREAD 


In some stock exchanges (e.g., NYSE), market makers play an important role in 
facilitating trades. They provide market liquidity by standing ready to buy or sell 
whenever the public wishes to buy or sell. By market liquidity, we mean the ability 
to buy or sell significant quantities of a security quickly, anonymously, and with 
little price impact. In return for providing liquidity, market makers are granted 
monopoly rights by the exchange to post different prices for purchases and sales of 
a security. They buy at the bid price P, and sell at a higher ask price P,. (For the 
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public, P, is the sale price and P, is the purchase price.) The difference P, — P is 
call the bid—ask spread, which is the primary source of compensation for market 
makers. Typically, the bid—ask spread is small—namely, one or two cents. 

The existence of a bid—ask spread, although small in magnitude, has several 
important consequences in time series properties of asset returns. We briefly discuss 
the bid—ask bounce—namely, the bid—ask spread introduces negative lag-1 serial 
correlation in an asset return. Consider the simple model of Roll (1984). The 
observed market price P, of an asset is assumed to satisfy 


P, = P* + ie, (5.9) 
where S = P, — P, is the bid—ask spread, P* is the time-t fundamental value of 
the asset in a frictionless market, and {J;} is a sequence of independent binary 
random variables with equal probabilities (i.e., Z, = 1 with probability 0.5 and 
= —1 with probability 0.5). The J, can be interpreted as an order-type indicator, 
with 1 signifying buyer-initiated transaction and —1 seller-initiated transaction. 
Alternatively, the model can be written as 


+$/2 with probability 0.5, 


P, = Pi + ; ze 
—S/2 with probability 0.5. 


If there is no change in P,*, then the observed process of price changes is 


S 
AP, = (L — h-1)5- (5.10) 


Under the assumption of J; in Eq. (5.9), E(/,) = 0 and Var(/,) = 1, and we have 
E(AP,) = 0 and 


Var(A P,) = S? /2, (5.11) 
Cov(AP,, AP,-1) = —S?/4, (5.12) 
Cov(AP,;, AP;_;) = 0, isi (5.13) 


Therefore, the autocorrelation function of A P, is 


—0.5 if j=1, 
(AP,) = 5.14 
PA 0 if j>1. ( ) 


The bid—ask spread thus introduces a negative lag-1 serial correlation in the series 
of observed price changes. This is referred to as the bid—ask bounce in the finance 
literature. Intuitively, the bounce can be seen as follows. Assume that the funda- 
mental price P* is equal to (P, + P,)/2. Then P, assumes the value P, or Py. If 
the previously observed price is P, (the higher value), then the current observed 
price is either unchanged or lower at Pp. Thus, AP; is either 0 or —S. However, if 
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the previous observed price is P, (the lower value), then AP, is either O or S. The 
negative lag-1 correlation in AP; becomes apparent. The bid—ask spread does not 
introduce any serial correlation beyond lag 1, however. 

A more realistic formulation is to assume that P;* follows a random walk so that 
APY = Př — P* , = €;, which forms a sequence of independent and identically 
distributed random variables with mean zero and variance o7. In addition, {e,} is 
independent of {/;}. In this case, Var(AP,) = o? + S*/2, but Cov(AP,, AP;_;) 
remains unchanged. Therefore, 


—S?/4 
pı(AP,) = Payg? = 
The magnitude of the lag-1 autocorrelation of AP, is reduced, but the negative 
effect remains when S = P, — P, > 0. In finance, it might be of interest to study 
the components of the bid—ask spread. Interested readers are referred to Campbell 
et al. (1997) and the references therein. 

The effect of bid—ask spread continues to exist in portfolio returns and in mul- 
tivariate financial time series. Consider the bivariate case. Denote the bivariate 
order-type indicator by J; = (lır, Jo)’, where I), is for the first security and Jy, 
for the second security. If Jı and In; are contemporaneously positively correlated, 
then the bid—ask spreads can introduce negative lag-1 cross correlations. 


5.3 EMPIRICAL CHARACTERISTICS OF TRANSACTIONS DATA 


Let t; be the calendar time, measured in seconds from midnight, at which the ith 
transaction of an asset takes place. Associated with the transaction are several vari- 
ables such as the transaction price, the transaction volume, the prevailing bid and 
ask quotes, and so on. The collection of t; and the associated measurements are 
referred to as the transactions data. These data have several important character- 
istics that do not exist when the observations are aggregated over time. Some of 
the characteristics are given next. 


1. Unequally Spaced Time Intervals. Transactions such as stock tradings on an 
exchange do not occur at equally spaced time intervals. As such, the observed 
transaction prices of an asset do not form an equally spaced time series. The 
time duration between trades becomes important and might contain useful 
information about market microstructure (e.g., trading intensity). 


2. Discrete-Valued Prices. The price change of an asset from one transaction 
to the next only occurred in multiples of tick size before January 29, 2001. 
On the NYSE, the tick size was one-eighth of a dollar before June 24, 1997 
and was one-sixteenth of a dollar before January 29, 2001. Therefore, the 
price was a discrete-valued variable in transactions data. Although all equity 
markets in the United States now use the decimal system, the price change in 
consecutive trades tends to occur in multiples of one cent and can be treated 
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approximately as a discrete-valued variable. In some markets, price change 
may also be subject to limit constraints set by regulators. 

3. Existence of a Daily Periodic or Diurnal Pattern. Under the normal trading 
conditions, transaction activity can exhibit a periodic pattern. For instance, 
on the NYSE, transactions are “heavier” at the beginning and closing of 
the trading hours and “thinner” during lunch hour, resulting in a U-shaped 
transaction intensity. Consequently, time durations between transactions also 
exhibit a daily cyclical pattern. 

4. Multiple Transactions within a Single Second. It is possible that multiple 
transactions, even with different prices, occur at the same time. This is partly 
due to the fact that time is measured in seconds, which may be too long a 
time scale in periods of heavy trading. 


To demonstrate these characteristics, we consider first the IBM transactions data 
from November 1, 1990, to January 31, 1991. These data are from the Trades, 
Orders Reports, and Quotes (TORQ) data set; see Hasbrouck (1992). There are 
63 trading days and 60,328 transactions. To simplify the discussion, we ignore the 
price changes between trading days and focus on the transactions that occurred in 
the normal trading hours from 9:30 am to 4:00 pm Eastern time. It is well known 
that overnight stock returns differ substantially from intraday returns; see Stoll 
and Whaley (1990) and the references therein. Table 5.1 gives the frequencies in 
percentages of price change measured in the tick size of $4 = $0.125. From the 
table, we make the following observations: 


About two-thirds of the intraday transactions were without price change. 
The price changed in one tick approximately 29% of the intraday transactions. 
Only 2.6% of the transactions were associated with two-tick price changes. 


Po bo 


Only about 1.3% of the transactions resulted in price changes of three ticks 
or more. 

5. The distribution of positive and negative price changes was approximately 
symmetric. 


Consider next the number of transactions in a 5-minute time interval. Denote 
the series by x,. That is, xı is the number of IBM transactions from 9:30 Am to 
9:35 AM on November 1, 1990, Eastern time; x2 is the number of transactions from 
9:35 AM to 9:40 AM; and so on. The time gaps between trading days are ignored. 
Figure 5.1(a) shows the time plot of x;, and Figure 5.1(b) shows the sample ACF 


TABLE 5.1 Frequencies of Price Change in Multiples of Tick Size for IBM Stock 
from November 1, 1990, to January 31, 1991 


Number (tick) <-3 —2 —1 0 1 2 >3 
Percentage 0.66 1.33 14.53 67.06 14.53 1.27 0.63 
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Figure 5.1 IBM intraday transactions data from 11/01/90 to 1/31/91: (a) number of transactions in 
5-minute time intervals and (b) sample ACF of series in part (a). 


of x, for lags 1-260. Of particular interest is the cyclical pattern of the ACF with 
a periodicity of 78, which is the number of 5-minute intervals in a trading day. 
The number of transactions thus exhibits a daily pattern. To further illustrate the 
daily trading pattern, Figure 5.2 shows the average number of transactions within 
5-minute time intervals over the 63 days. There are 78 such averages. The plot 
exhibits a “smiling” or U shape, indicating heavier trading at the opening and 
closing of the market and thinner trading during the lunch hours. 

Since we focus on transactions that occurred during normal trading hours of 
a trading day, there are 59,838 time intervals in the data. These intervals are 
called the intraday durations between trades. For IBM stock, there were 6531 
zero time intervals. That is, during the normal trading hours of the 63 trading 
days from November 1, 1990, to January 31, 1991, multiple transactions in a 
second occurred 6531 times, which is about 10.91%. Among these multiple trans- 
actions, 1002 of them had different prices, which is about 1.67% of the total 
number of intraday transactions. Therefore, multiple transactions (i.e., zero dura- 
tions) may become an issue in statistical modeling of the time durations between 
trades. 

Table 5.2 provides a two-way classification of price movements. Here price 
movements are classified into “up,” “unchanged,” and “down.” We denote them 
by +, 0, and —, respectively. The table shows the price movements between two 
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Figure 5.2 Time plot of average number of transactions in 5-minute time intervals. There are 78 
observations, averaging over 63 trading days from 11/01/90 to 1/31/91 for IBM stock. 


TABLE 5.2 Two-Way Classification of Price Movements in Consecutive Intraday 
Trades for IBM Stock“ 


ith Trade 
(i — 1)th trade + (0) — Margin 
+ 441 5498 3948 9887 
0 4867 29779 5473 40119 
— 4580 4841 410 9831 
Margin 9888 40118 9831 59837 


“The price movements are classified into “up,” “unchanged,” and “down.” The data span is from 
November 1, 1990, to January 31, 1991. 


consecutive trades [i.e., from the (i — 1)th to the ith transaction] in the sample. 
From the table, trade-by-trade data show that: 


1. Consecutive price increases or decreases are relatively rare, which are about 
441/59837 = 0.74% and 410/59837 = 0.69%, respectively. 


2. There is a slight edge to move from up to unchanged rather than to down; 
see row | of the table. 


3. There is a high tendency for the price to remain unchanged. 
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4. The probabilities of moving from down to up or unchanged are about the 
same; see row 3. 


The first observation mentioned before is a clear demonstration of bid—ask 
bounce, showing price reversals in intraday transactions data. To confirm this phe- 
nomenon, we consider a directional series D; for price movements, where D; 
assumes the value +1, 0, and —1 for up, unchanged, and down price movement, 
respectively, for the ith transaction. The ACF of {D;} has a single spike at lag 1 
with value —0.389, which is highly significant for a sample size of 59,837 and 
confirms the price reversal in consecutive trades. 

As a second illustration, we consider the transactions data of IBM stock in 
December 1999 obtained from the TAQ database. The normal trading hours are 
from 9:30 AM to 4:00 pM Eastern time, except for December 31 when the market 
closed at 1:00 pm. Comparing with the 1990-1991 data, two important changes 
have occurred. First, the number of intraday tradings has increased sixfold. There 
were 134,120 intraday tradings in December 1999 alone. The increased trading 
intensity also increased the chance of multiple transactions within a second. The 
percentage of trades with zero time duration doubled to 22.98%. At the extreme, 
there were 42 transactions within a given second that happened twice on December 
3, 1999. Second, the tick size of price movement was $i = $0.0625 instead of 
$4. The change in tick size should reduce the bid—ask spread. Figure 5.3 shows the 
daily number of transactions in the new sample. Figure 5.4(a) shows the time plot 
of time durations between trades, measured in seconds, and Figure 5.4(b) is the 
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Figure 5.3 IBM transactions data for December 1999. Box plot shows the number of transactions in 
each trading day with after-hours portion denoting number of trades with time stamp after 4:00 PM. 
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Figure 5.4 IBM transactions data for December 1999. (a) Time plot of time durations between trades 
and (b) time plot of price changes in consecutive trades measured in multiples of tick size of $1/16. 
Only data during normal trading hours are included. 


time plot of price changes in consecutive intraday trades, measured in multiples 
of the tick size of $k. As expected, Figures 5.3 and 5.4(a) show clearly the 
inverse relationship between the daily number of transactions and the time interval 
between trades. Figure 5.4(b) shows two unusual price movements for IBM stock 
on December 3, 1999. They were a drop of 63 ticks followed by an immediate 
jump of 64 ticks and a drop of 68 ticks followed immediately by a jump of 68 ticks. 
Unusual price movements like these occurred infrequently in intraday transactions. 

Focusing on trades recorded within regular trading hours, we have 61,149 trades 
out of 133,475 with no price change. This is about 45.8% and substantially lower 
than that between November 1990 and January 1991. It seems that reducing the 
tick size increased the chance of a price change. Table 5.3 gives the percentages of 
trades associated with a price change. The price movements remain approximately 


TABLE 5.3 Percentages of Intraday Transactions Associated with a Price Change 
for IBM Stock Traded in December 1999¢ 


Size 1 2 3 4 5 6 7 >7 
Downward Movements 
Percentage 18.03 5.80 1.79 0.66 0.25 0.15 0.09 0.32 
Upward Movements 
Percentage 18.24 5.57 1:79 0.71 0.24 0.17 0.10 0.31 


“The percentage of transactions without price change is 45.8% and the total number of transactions 
recorded within regular trading hours is 133,475. The size is measured in multiples of tick size $ 1/16. 
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Figure 5.5 Transactions data of Boeing stock on December 1, 2008. (a) Price series over calendar time 
measured in seconds from midnight and (b) time plot of price changes in consecutive trades measured 
in cents. Only data during normal trading hours are included. 


symmetric with respect to zero. Large price movements in intraday tradings are 
still relatively rare. 

Finally, we consider the transactions data of Boeing stock on December 1, 
2008. There are 43,894 transactions within the regular trading hours. Figure 5.5(a) 
shows the transaction prices versus the calendar time measured in seconds from 
the midnight, and Figure 5.5(b) shows the time plot of price changes. In this 
particular instance, the price shows a downward trend within the day, but the 
price changes continue to exhibit patterns similar to those before using the decimal 
system. Figure 5.6 shows the histogram of the price changes for the Boeing stock. 
The histogram shows some distinct characteristics. First, the price changes appear to 
be symmetric with respective to zero. Second, the price changes indeed concentrate 
on multiples of one cent. Out of the 43,894 transactions, 58.5% have no price 
change; see the big spike of the histogram. Details of the summary of price changes 
for the Boeing stock are given in Table 5.4. The remaining 4.59% of the price 
changes not shown in Table 5.4 are not in multiples of one cent. 


Remark. The recordkeeping of high-frequency data is often not as good as 
that of observations taken at lower frequencies. Data cleaning becomes a necessity 
in high-frequency data analysis. For transactions data, missing observations may 
happen in many ways, and the accuracy of the exact transaction time might be 
questionable for some trades. For example, recorded trading times may be beyond 
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Figure 5.6 Histogram of price changes for Boeing stock on December 1, 2008. 


TABLE 5.4 Frequencies of Price Change for Boeing Stock on December 1, 2008 


Cents <-3 3 2 1 0 1 2 3 >3 
Percentage 1.63 1.05 3.51 12.6 58.5 12.2 3.45 0.94 1.53 


4:00 pm Eastern time even before the opening of after-hours tradings. How to handle 
these observations deserves a careful study. A proper method of data cleaning 
requires a deep understanding of the way in which the market operates. As such, 
it is important to specify clearly and precisely the methods used in data cleaning. 
These methods must be taken into consideration in making inference. 


Again, let t; be the calendar time, measured in seconds from midnight, when the 
ith transaction took place. Let P;, be the transaction price. The price change from 
the (i — 1)th to the ith trade is y; = AP, = P;, — P;,,_, and the time duration is 
At; = ti — t;-1. Here it is understood that the subscript i in At; and y; denotes the 
time sequence of transactions, not the calendar time. In what follows, we consider 
models for y; and Az; both individually and jointly. 


5.4 MODELS FOR PRICE CHANGES 


The discreteness and concentration on “no change” make it difficult to model 
the intraday price changes. Campbell et al. (1997) discuss several econometric 
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models that have been proposed in the literature. Here we mention two models 
that have the advantage of employing explanatory variables to study the intraday 
price movements. The first model is the ordered probit model used by Hauseman, 
Lo, and MacKinlay (1992) to study the price movements in transactions data. The 
second model has been considered recently by McCulloch and Tsay (2000) and is 
a simplified version of the model proposed by Rydberg and Shephard (2003); see 
also Ghysels (2000). 


5.4.1 Ordered Probit Model 


Let y; be the unobservable price change of the asset under study (i.e., yf = Př — 
P;,_,), where Př is the virtual price of the asset at time t. The ordered probit model 
assumes that y* is a continuous random variable and follows the model 


yt = xB + i, (5.15) 
where x; is a p-dimensional row vector of explanatory variables available at 
time f;_1, B is a p x 1 parameter vector, E(e;|x;) = 0, Var(e;|x;) = of, and 


2 


i 


Cov(e;, €j) = 0 fori Æ j. The conditional variance o 
function of the explanatory variable w;, that is, 


is assumed to be a positive 


o = g(wi), (5.16) 


where g(-) is a positive function. For financial transactions data, w; may contain 
the time interval t; — t;_; and some conditional heteroscedastic variables. Typically, 
one also assumes that the conditional distribution of €; given x; and w; is Gaussian. 

Suppose that the observed price change y; may assume k possible values. In 
theory, k can be infinity, but countable. In practice, k is finite and may involve 
combining several categories into a single value. For example, we have k = 7 in 
Table 5.1, where the first value “—3 ticks” means that the price change is —3 ticks 
or lower. We denote the k possible values as {s1, . . . , Sk}. The ordered probit model 
postulates the relationship between y; and y* as 


yi = Sj if dj-1 < y7 <ay;, T= leek (5.17) 


where a; are real numbers satisfying — 00 = ag < aj < -+> < Q@_| < Œk = OO. 
Under the assumption of conditional Gaussian distribution, we have 


P(y; = sj|xi, Wi) = Plaj-1 < xib + €i < aj|x;, wi) 
P(x; BP + €i < a4|x;, w;) ifj=1, 


= P(aj-1 < xiß +& < aj|x;, wi) if j=2,...,k-1, 
P(ap_-1 < x; 8B + €;|x;, wi) if j=k, 
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o [Saat if j= 1, 
oj (Wj) 
z o [ue -o [te if j=2,...,k-1, 
o; (Wi) ai (wi) 
pa o [Se if j=k, 
oi (W;) 


(5.18) 


where ®(x) is the cumulative distribution function of the standard normal random 
variable evaluated at x, and we write o;(w;) to denote that o? is a positive function 
of w;. From the definition, an ordered probit model is driven by an unobservable 
continuous random variable. The observed values, which have a natural ordering, 
can be regarded as categories representing the underlying process. 

The ordered probit model contains parameters B, œ; (i = 1,...,k — 1), and 
those in the conditional variance function o;(w;) in Eq. (5.16). These parame- 
ters can be estimated by the maximum-likelihood or Markov chain Monte Carlo 
methods. 


Example 5.1. Hauseman et al. (1992) apply the ordered probit model to 
the 1988 transactions data of more than 100 stocks. Here we only report their 
result for IBM. There are 206,794 trades. The sample mean (standard deviation) 
of price change y;, time duration At;, and bid—ask spread are —0.0010(0.753), 
27.21(34.13), and 1.9470(1.4625), respectively. The bid—ask spread is measured 
in ticks. The model used has nine categories for price movement, and the functional 
specifications are 


3 3 3 
xib = Bi Atf + ÑO Bost viv + D> Bos4SP5i—-v + X Br471BSj—-» 


v=1 v=l1 v=1 
3 
+ >) Bool (Viv) x IBS;_o], (5.19) 


v=1 


of (wi) = 1.0 + y? Att + yZAB;_1, (5.20) 


where T} (V) = (V* — 1)/A is the Box—Cox (1964) transformation of V with A € 
[0, 1] and the explanatory variables are defined by the following: 


e At* = (ti — ti—1)/100 is a rescaled time duration between the (i — 1)th and 
ith trades with time measured in seconds. 


e AB;—ı is the bid—ask spread prevailing at time f;_; in ticks. 
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e Yi- (v = 1, 2, 3) is the lagged value of price change at f;_, in ticks. With k 
= 9, the possible values of price changes are {—4, —3, —2, — 1, 0, 1, 2, 3, 4} 
in ticks. 

e V;_, (v= 1, 2, 3) is the lagged value of dollar volume at the (i — v)th trans- 
action, defined as the price of the (i — v)th transaction in dollars times the 
number of shares traded (denominated in hundreds of shares). That is, the 
dollar volume is in hundreds of dollars. 

e SP5;-» (v = 1, 2, 3) is the 5-minute continuously compounded returns of the 
Standard and Poor’s 500 index futures price for the contract maturing in the 
closest month beyond the month in which transaction (i — v) occurred, where 
the return is computed with the futures price recorded 1 minute before the 
nearest round minute prior to f;_, and the price recorded 5 minutes before this. 


e IBS;_,» (v = 1, 2, 3) is an indicator variable defined by 


1 if Pi-y >(Pf + pe 9/2; 
IBS; = 40 if Piy = (P4, + P? ,)/2, 
—l if Pi-y < (Ee, +F PÈ? )/2, 


where P7 and p? are the ask and bid price at time tj. 


The parameter estimates and their f ratios are given in Table 5.5. All the ¢ ratios 
are large except one, indicating that the estimates are highly significant. Such high 
t ratios are not surprising as the sample size is large. For the heavily traded IBM 
stock, the estimation results suggest the following conclusions: 


1. The boundary partitions are not equally spaced but are almost symmetric 
with respect to zero. 


TABLE 5.5 Parameter Estimates of Ordered Probit Model in Eqs. (5.19) and (5.20) 
for the 1988 Transaction Data of IBM, Where t Denotes the ¢ Ratio 
Boundary Partitions of the Probit Model 
Parameter Qa] a2 a3 a4 als a6 a7 ag 
—4.67 —4.16 —3.11 —1.34 1.33 3.13 4.21 4.73 
—145.7 —157.8 —171.6 —155.5 154.9 167.8 152.2 138.9 


Equation Parameters of the Probit Model 


Estimate 
t 


Parameter] yı v2 Bi: At® Bory Ba Bs Bo 
Estimate 0.40 0.52 —0.12 1.01 0.53 0.21 1.12 —0.26 
t 15.6 TII —11.4 —135.6 —85.0 —47.2 54.2 —12.1 
Parameter] £7 Bs Bo: Bio Bu Bi2 p13 

Estimate 0.01 —1.14 —0.37 —0.17 0.12 0.05 0.02 

t 0.26 —63.6 —21.6 —10.3 47.4 18.6 7.7 


Source: Reprinted with permission of Elsevier from Journal of Financial Economics (1992, Vol. 31, 
p. 345) 
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2. The transaction duration At; affects both the conditional mean and condi- 
tional variance of y; in Eqs. (5.19) and (5.20). 

3. The coefficients of lagged price changes are negative and highly significant, 
indicating price reversals. 

4. As expected, the bid—ask spread at time ¢;_; significantly affects the condi- 
tional variance. 


5.4.2 Decomposition Model 


An alternative approach to modeling price change is to decompose it into three 
components and use conditional specifications for the components; see Rydberg 
and Shephard (2003). The three components are an indicator for price change, the 
direction of price movement if there is a change, and the size of price change if a 
change occurs. Specifically, the price change at the ith transaction can be written as 


yi = Pi, — Py = 4; Disi, (5.21) 


where A; is a binary variable defined as 


is l if there is a price change at the ith trade, (5.22) 


O if price remains the same at the ith trade, 


D; is also a discrete variable signifying the direction of the price change if a change 
occurs, that is, 


1 if price increases at the ith trade, 


—1 if price drops at the ith trade, (5.23) 


D;\(A; = 1) = | 


where D;|(A; = 1) means that D; is defined under the condition of A; = 1, and S; is 
the size of the price change in ticks if there is a change at the ith trade and S; = 0 
if there is no price change at the ith trade. When there is a price change, S; is a 
positive integer-valued random variable. 

Note that D; is not needed when A; = 0, and there is a natural ordering in the 
decomposition. D; is well defined only when A; = 1 and S; is meaningful when 
A; = l and D; is given. Model specification under the decomposition makes use 
of the ordering. 

Let F; be the information set available at the ith transaction. Examples of 
elements in F; are At;_;, Aj_j;, Di—j, and S;_; for j = 0. The evolution of price 
change under model (5.21) can then be partitioned as 


P(yi|Fi-1) = P(A; Di Si|Fi-1) 
= P(S;|D;, Ai, Fi-1) P(Di\ Ai, Fi-1) P(Ail Fi-1).- (5.24) 
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Since A; is a binary variable, it suffices to consider the evolution of the probability 
pi = P(A; = 1) over time. We assume that 


Pi ai 
n( 2) =x;p or Pi = TF exp’ (5.25) 


where x; is a finite-dimensional vector consisting of elements of F;—ı and £ is a 
parameter vector. Conditioned on A; = 1, D; is also a binary variable, and we use 
the following model for 6; = P (D; = 1|A; = 1): 


nE 5 — 5.26 
a(z) = or i= Ty eur’ (5.26) 


where z; is a finite-dimensional vector consisting of elements of F;_; and y is 
a parameter vector. To allow for asymmetry between positive and negative price 
changes, we assume that 


gayi) if Dj = 1, A; = 1, 


(5.27) 
gai) if Di =—1, A; = 1, 


Si|(D;, Ai = 1) ~ | 


where g(A) is a geometric distribution with parameter à and the parameters Àj; 
evolve over time as 


Aji e”ibj 
In (+ = wib j or Àji = T+ ee)” j =U, d, (5.28) 
— ja e aih | 


where w; is again a finite-dimensional explanatory variable in F;—ı and 6; is a 
parameter vector. 

In Eq. (5.27), the probability mass function of a random variable x, which 
follows the geometric distribution g(A), is 


p(x =m) =A(1— 12)”, m=0,1,2,.... 


We added 1 to the geometric distribution so that the price change, if it occurs, 
is at least | tick. In Eq. (5.28), we take the logistic transformation to ensure that 
Xr pi € [0, 1]. 

The previous specification classifies the ith trade, or transaction, into one of 
three categories: 


1. No price change: A; = 0 and the associated probability is (1 — pj). 

2. A price increase: A; = 1, D; = 1, and the associated probability is p;ô;. The 
size of the price increase is governed by 1 + g(A,.i). 

3. A price drop: Aj = 1, D; = —1, and the associated probability is p;(1 — 4;). 
The size of the price drop is governed by 1 + g(Aq,i). 
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Let J;(j) for j = 1, 2,3 be the indicator variables of the prior three categories. 
That is, J;(j) = 1 if the jth category occurs and J;(j) = 0 otherwise. The log- 
likelihood function of Eq. (5.24) becomes 


In{P Qi | Fi-1)] = GA) Inf — pi)] + F(2)Un(p;) + In(6;) 
+ In(ayi) + (S; — 1) ln(1 — àu,i)] 
+ J;(3)Un(p;) + Ind. — 6;) + 1n (a,i) + (S; — 1) Ind — Agi], 


and the overall log-likelihood function is 


DEP Oi, +, YalFo)l = } >In POI Fi—1)], (5.29) 


i=l 
which is a function of parameters $, y, 0u, and 04. 


Example 5.2. We illustrate the decomposition model by analyzing the intraday 
transactions of IBM stock from November 1, 1990, to January 31, 1991. There were 
63 trading days and 59,838 intraday transactions in the normal trading hours. The 
explanatory variables used are: 


1. A;—1: the action indicator of the previous trade [1.e., the (i — 1)th trade within 
a trading day] 


. Dj;_,: the direction indicator of the previous trade 

S;—1: the size of the previous trade 

Vi—1: the volume of the previous trade, divided by 1000 
At,—1: time duration from the (i — 2)th to (i — 1)th trade 

. BA;: the bid—ask spread prevailing at the time of transaction 


DUBEY 


Because we use lag-1 explanatory variables, the actual sample size is 59,775. It 
turns out that V;_;, At;—1, and BA; are not statistically significant for the model 
entertained. Thus, only the first three explanatory variables are used. The model 
employed is 


in( a ) = + in 
l— pi 


= Y + Yı Di-1, (5.30) 


1 ) 
In Aui = 0 + 0 S 
u u i , 
1 — Àu,i 9 j : 


= 64,0 + Oa,1Si-1- 
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TABLE 5.6 Parameter Estimates of ADS Model in Eq. (5.30) for IBM Intraday 
Transactions from December 1, 1990, to January 31, 1991 


Parameter Bo Bi Yo yı 
Estimate —1.057 0.962 —0.067 —2.307 
Standard Error 0.104 0.044 0.023 0.056 
Parameter 0u.0 Ou.1 04.0 6a1 
Estimate 2.235 —0.670 2.085 —0.509 
Standard Error 0.029 0.050 0.187 0.139 


The parameter estimates, using the log-likelihood function in Eq. (5.29), are given 
in Table 5.6. The estimated simple model shows some dynamic dependence in the 
price change. In particular, the trade-by-trade price changes of IBM stock exhibit 
some appealing features: 


1. The probability of a price change depends on the previous price change. 
Specifically, we have 


PAS iA = 0) = 0.258, P(A; = l|Ai-1 = 1) = 0.476. 


The result indicates that a price change may occur in clusters and, as expected, 
most transactions are without price change. When no price change occurred 
at the (i — 1)th trade, then only about one out of four trades in the subsequent 
transaction has a price change. When there is a price change at the (i — 1)th 
transaction, the probability of a price change in the ith trade increases to 
about 0.5. 


2. The direction of price change is governed by 


0.483 if Di-ı =0 (ie., Aj-1 = 0), 
P(D; = WFi-1, Ai) = 40.085 if Dar = 1, A; = 1, 
0.904 if Di-ı = —1, A; = 1. 


This result says that (a) if no price change occurred at the (i — 1)th trade, 
then the chances for a price increase or decrease at the ith trade are about 
even; and (b) the probabilities of consecutive price increases or decreases are 
very low. The probability of a price increase at the ith trade given that a price 
change occurs at the ith trade and there was a price increase at the (i — 1)th 
trade is only 8.6%. However, the probability of a price increase is about 
90% given that a price change occurs at the ith trade and there was a price 
decrease at the (i — 1)th trade. Consequently, this result shows the effect of 
bid—ask bounce and supports price reversals in high-frequency trading. 


3. There is weak evidence suggesting that big price changes have a higher 
probability to be followed by another big price change. Consider the size of 
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a price increase. We have 
Si(Di = 1) ~ 1+ 8 Qui), Au,i = 2.235 — 0.670S;_1. 


Using the probability mass function of a geometric distribution, we obtain 
that the probability of a price increase by one tick is 0.827 at the ith trade 
if the transaction results in a price increase and S;_; = 1. The probability 
reduces to 0.709 if S;-; = 2 and to 0.556 if S;_; = 3. Consequently, the 
probability of a large S; is proportional to S;—ı given that there is a price 
increase at the ith trade. 


A difference between the ADS of Eq. (5.21) and ordered probit models is that 
the former does not require any truncation or grouping in the size of a price change. 


R Demonstration for Logistic Linear Regression 
The following output has been edited: 


da=read.table("ibm91-ads.txt",header=T) 
dal=read.table("ibm91-adsx.txt",header=T) 
Ai=da[,1] % Select the variables 
Di=da[,2] 

Aiml=dal1[,4] 

Diml=dal1[,5] 


VVVVVVV V— 


ml=glm(Ai~Aim1,family=binomial) %Fit a linear 
logistic model 

> summary (m1) 

Call: 

glm(formula = Ai ~ Aiml, family = binomial) 


Deviance Residuals: 
Min 10 Median 30 Max 
-1.1373 -0.7724 -0.7724 1.2180 1.6462 


Coefficients: 

Estimate Std. Error z value Pr (>|z|) 
(Intercept) -1.05667 0.01142 -92.55 <2e-16 *** 
Aim1 0.96164 0.01827 52.62 <2e-16 *** 


di=Di [Ai==1] % Select the cases in which Ai = 1. 

dim1=Dim1 [Ai==1] 

di=(ditabs(di))/2 % Logistic regression works for 1 or 0, 
% but di is coded 1 or -1 so that change is needed. 

> m2=glm(di~dim1, family=binomial) 


vvVV Vv 


> summary (m2) 
Call: 
glm(formula = di ~ diml, family = binomial) 
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Deviance Residuals: 
Min 10 Median 30 Max 
-2.1640 -1.1493 0.4497 1.2058 2.2193 


Coefficients: 

Estimate Std. Error z value Pr(>|z|) 
(Intercept) -0.06663 0.01728 =3.855 0.000116 *** 
dim1 -2.30693 0.03595 -64.171 < 2e-16 *** 


5.5 DURATION MODELS 


Duration models are concerned with time intervals between trades. Longer dura- 
tions indicate lack of trading activities, which in turn signify a period of no new 
information. The dynamic behavior of durations thus contains useful information 
about intraday market activities. Using concepts similar to the ARCH models for 
volatility, Engle and Russell (1998) propose an autoregressive conditional dura- 
tion (ACD) model to describe the evolution of time durations for (heavily traded) 
stocks. Zhang et al. (2001) extend the ACD model to account for nonlinearity and 
structural breaks in the data. In this section, we introduce some simple duration 
models. As mentioned before, intraday transactions exhibit some diurnal pattern. 
Therefore, we focus on the adjusted time duration 


At* = Ati /f (ti), (5.31) 


where f(t;) is a deterministic function consisting of the cyclical component of 

Ati. Obviously, f (ti) depends on the underlying asset and the systematic behavior 

of the market. In practice, there are many ways to estimate f(t;), but no single 

method dominates the others in terms of statistical properties. A common approach 

is to use smoothing spline. Here we use simple quadratic functions and indicator 

variables to take care of the deterministic component of daily trading activities. 
For the IBM data employed in the illustration of ADS models, we assume 


7 
f(t) = exp[d(t)], d(ti) = Bot >. Bi filti), (5.32) 
j=l 
where 
i — 38700\* 
t — 43200 \? = (HS) if t < 43200, 
fit) =- 74400)’ Ati) = 7500 
0 otherwise, 
ti — 48600 \ 7? 
ti — 48300 \ 7 =|). ift > 43200, 
fhalti) = —- (i) , fai) = ( 9000 ) mes 


0 otherwise, 
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Figure 5.7 Quadratic functions used to remove deterministic component of IBM intraday trading 
durations: (a)—(d) are functions fı(-) to f4(-) of Eq. (5.32), respectively. 


and fs(t;) and f6(¢;) are indicator variables for the first and second 5 minutes of 
market opening [i.e., f5(-) = 1 if and only if t; is between 9:30 Am and 9:35 AM 
Eastern time], and f7(t;) is the indicator for the last 30 minutes of daily trading 
[i.e., f7(t;) = 1 if and only if the trade occurred between 3:30 Pm and 4:00 PM 
Eastern time]. Figure 5.7 shows the plot of fi(-) fori = 1,...,4, where the time 
scale on the x axis is in minutes. Note that /3(43200) = f4(43200), where 43,200 
corresponds to 12:00 noon. 

The coefficients 6; of Eq. (5.32) are obtained by the least-squares method of 
the linear regression 


: 
MAG) = Bo +Y B; fits) +. 


j=l 
The fitted model is 


In(At;) = 2.555 + 0.159 fi (ti) + 0.270 fo (ti) + 0.384 f3 (ti) 
+ 0.061 f4 (ti) — 0.611 fs (ti) — 0.157 fe (ti) + 0.073 f (ti). 
Figure 5.8 shows the time plot of average durations in 5-minute time intervals over 


the 63 trading days before and after adjusting for the deterministic component. 
Figure 5.8(a) shows the average durations of Af; and, as expected, exhibits a 
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Figure 5.8 IBM transactions data from 11/01/90 to 1/31/91: (a) average durations in 5-minute time 
intervals and (b) average durations in 5-minute time intervals after adjusting for deterministic component. 


diurnal pattern. Figure 5.8(b) shows the average durations of At¥ (i.e., after the 
adjustment), and the diurnal pattern is largely removed. 


5.5.1 The ACD Model 


The autoregressive conditional duration (ACD) model uses the idea of GARCH 
models to study the dynamic structure of the adjusted duration At* of Eq. (5.31). 
For ease in notation, we define x; = Atž*. 

Let W; = E(x;|Fi-1) be the conditional expectation of the adjusted duration 
between the (i — 1)th and ith trades, where F;_, is the information set available 
at the (i — 1)th trade. In other words, y; is the expected adjusted duration given 
F;_,. The basic ACD model is defined as 


Xi = WiGi, (5.33) 
where {e;} is a sequence of independent and identically distributed nonnegative 
random variables such that E(e;) = 1. In Engle and Russell (1998), €; follows a 


standard exponential or a standardized Weibull distribution, and y; assumes the 
form 


i =O+)) yji + > OjVi-j. (5.34) 
j=j =i 
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Such a model is referred to as an ACD(r, s) model. When the distribution of €; 
is exponential, the resulting model is called an EACD(r, s) model. Similarly, if 
€i follows a Weibull distribution, the model is a WACD(r, s) model. If necessary, 
readers are referred to Appendix A for a quick review of exponential and Weibull 
distributions. 

Similar to GARCH models, the process 7; = x; — Wi is a martingale difference 
sequence [i.e., E(n;|F;—-1) = 0], and the ACD(r, s) model can be written as 


max(r,s) sS 
xj =0+ > (yj +@s)x;-7 — J @jni_-j + nj, (5.35) 
j=l j=l 


which is in the form of an ARMA process with non-Gaussian innovations. It is 
understood here that y; = 0 for j >r and w; = 0 for j >s. Such a representation 
can be used to obtain the basic conditions for weak stationarity of the ACD model. 
For instance, taking expectation on both sides of Eq. (5.35) and assuming weak 
stationarity, we have 


w 


E (xi) = max(r,s) . 

l=) (PFO) 

Therefore, we assume w > 0 and 1 > }` j (yj + œj) because the expected duration is 
positive. As another application of Eq. (5.35), we study properties of the EACD(1,1) 
model. 


EACD(,1) Model 
An EACD(1,1) model can be written as 


xi = Wii, Vi = @ + y1xi-1 + O1Yi-1, (5.36) 


where e; follows the standard exponential distribution. Using the moments of a 
standard exponential distribution in Appendix A, we have E(e;) = 1, Var(e;) = 1, 
and E(e?) = Var(x;) + [E xD) = 2. Assuming that x; is weakly stationary (i.e., 
the first two moments of x; are time invariant), we derive the variance of x;. First, 
taking the expectation of Eq. (5.36), we have 


E(x) = EL[E(WweilFi-1)] = EW), 
E(Wj) = w + y E(xi-1) + i E (Yi-1). (5-37) 


Under weak stationarity, E(w;) = E(w;_1) so that Eq. (5.37) gives 


lix = E(x) = E(Wi) = == (5.38) 


Next, because E(€?) = 2, we have E(x?) = E(E(w7e?|Fi-1)] - 2E(W?). 
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Taking the square of y; in Eq. (5.36) and the expectation and using weak 
stationarity of y; and x;, we have, after some algebra, that 


1 — (yı +1)? 


EP = u x ———_——_. 
1— 2y? = o = 2y1%w1 


(5.39) 


Finally, using Var(x;) = E(x?) — [E (x;)]? and E(x?) = 2E(W?), we have 


1— w = 2y,@ 
Var(xj) = 2E (YP) — wy = wy x ——, +, 
1—ay — 2yo — 2yf 


where ux is defined in Eq. (5.38). This result shows that, to have time-invariant 
unconditional variance, the EACD(1,1) model in Eq. (5.36) must satisfy 1 > 2y? + 
or + 2yıwı. The variance of a WACD(1,1) model can be obtained by using the 
same techniques and the first two moments of a standardized Weibull distribution. 


ACD Models with a Generalized Gamma Distribution 

In the statistical literature, intensity function is often expressed in terms of hazard 
function. As shown in Appendix B, the hazard function of an EACD model is 
constant over time and that of a WACD model is a monotonous function. These 
hazard functions are rather restrictive in application as the intensity function of 
stock transactions might not be constant or monotone over time. To increase the 
flexibility of the associated hazard function, Zhang et al. (2001) employ a (stan- 
dardized) generalized gamma distribution for €;. See Appendix A for some basic 
properties of a generalized gamma distribution. The resulting hazard function may 
assume various patterns, including U shape or inverted U shape. We refer to an 
ACD model with innovations that follow a generalized gamma distribution as a 
GACD(r, s) model. 


5.5.2 Simulation 


To illustrate ACD processes, we generated 500 observations from the ACD(1,1) 
model: 


xi = Wie, Wi = 0.3 + 0.2x;-; + 0.7-1 (5.40) 


using two different innovational distributions for ¢€;. In case 1, €; is assumed to 
follow a standardized Weibull distribution with parameter œ = 1.5. In case 2, €; 
follows a (standardized) generalized gamma distribution with parameters « = 1.5 
and a = 0.5. 

Figure 5.9(a) shows the time plot of the WACD(1,1) series, whereas 
Figure 5.10(a) is the GACD(1,1) series. Figure 5.11 plots the histograms of both 
simulated series. The difference between the two models is evident. Finally, 
the sample ACFs of the two simulated series are shown in Figures 5.12(a) and 
5.13(b), respectively. The serial dependence of the data is clearly seen. 
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Figure 5.9 Simulated WACD(1,1) series in Eq. (5.40): (a) original series and (b) standardized series 
after estimation. There are 500 observations. 
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Figure 5.10 Simulated GACD(1,1) series in Eq. (5.40): (a) original series and (b) standardized series 
after estimation. There are 500 observations. 
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Figure 5.11 Histograms of simulated duration processes with 500 observations: (a) WACD(1,1) model 
and (b) GACD(1,1) model. 
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Figure 5.12 Sample autocorrelation function of simulated WACD(1,1) series with 500 observations: 
(a) original series and (b) standardized residual series. 
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Figure 5.13 Sample autocorrelation function of simulated GACD(1,1) series with 500 observations: 
(a) original series and (b) standardized residual series. 


5.5.3 Estimation 


For an ACD(r, s) model, let i, = max(r, s) and x; = (x1,...,X;)’. The likelihood 
function of the durations x1, ..., xr is 


rE 
fer) =| [| LF | x FEil), 


i=isti 


where @ denotes the vector of model parameters, and T is the sample size. The 
marginal probability density function f(x;,|@) of the previous equation is rather 
complicated for a general ACD model. Because its impact on the likelihood function 
is diminishing as the sample size T increases, this marginal density is often ignored, 
resulting in use of the conditional-likelihood method. For a WACD model, we use 
the probability density function (pdf) of Eq. (5.56) and obtain the conditional 
log-likelihood function 


T 


1 
£(x|0,x;,) = > a In [r (1+ 3 


ixig+1 


a Xi rd+ al 
‘al \eeigl = ja | 5.41 
"() j a(ž) | Wi 
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TABLE 5.7 Estimation Results for Simulated ACD(1,1) Series with 500 
Observations: For WACD(1,1) Series and GACD(1,1) Series 


WACD(1,1) Model 


Parameter w yı wl a 
True 0.3 0.2 0.7 1.5 
Estimate 0.364 0.100 0.767 1.477 
Standard Error (0.139) (0.025) (0.060) (0.052) 

GACD(1,1) Model 
Parameter w yı wl a K 
True 0.3 0.2 0.7 0.5 1.5 
Estimate 0.401 0.343 0.561 0.436 2.077 
Standard Error (0.117) (0.074) (0.065) (0.078) (0.653) 
where Wj =@ + ae Vjxi-j + Bem Wj Wi-j> 0=(0,VI,.--, Vr, Oig ecs Øs a)’, 
and x = (xj,41,.-.,X7)’. When œ = 1, the (conditional) log-likelihood function 


reduces to that of an EACD(7, s) model. 
For a GACD(r, s) model, the conditional log-likelihood function is 


T a 
e(x|0, x;,) = in| 2 |+ ee = ines -a In(ay;) — (=) 
», me AV 


(5.42) 


where A =T (x)/T (k + 1/a) and the parameter vector 6 now also includes x. As 
expected, when x = 1, à = 1/T (1 + 1/q@) and the log-likelihood function in Eq. 
(5.42) reduces to that of a WACD(r, s) model in Eq. (5.41). This log-likelihood 
function can be rewritten in many ways to simplify the estimation. 

Under some regularity conditions, the conditional maximum-likelihood estimates 
are asymptotically normal; see Engle and Russell (1998) and the references therein. 
In practice, simulation can be used to obtain finite-sample reference distributions 
for the problem of interest once a duration model is specified. 


Example 5.3. (Simulated ACD(1,1) series, continued). Consider the simu- 
lated WACD(1,1) and GACD(1,1) series of Eq. (5.40). We apply the conditional- 
likelihood method and obtain the results in Table 5.7. The estimates appear to be 
reasonable. Let Wi be the 1-step-ahead prediction of y; and ê; = x;/ Wi be the stan- 
dardized series, which can be regarded as standardized residuals of the series. If 
the model is adequately specified, {€;} should behave as a sequence of independent 
and identically distributed random variables. Figures 5.9(b) and 5.10(b) show the 
time plot of €; for both models. The sample ACF of €; for both fitted models are 
shown in Figures 5.12(b) and 5.13(b), respectively. It is evident that no significant 
serial correlations are found in the ê; series. 


Example 5.4. As an illustration of duration models, we consider the transac- 
tion durations of IBM stock on five consecutive trading days from November 1 
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Figure 5.14 Time plots of durations for IBM stock traded in first five trading days of November 1990: 
(a) adjusted series and (b) normalized innovations of an WACD(1,1) model. There are 3534 nonzero 
durations. 


to November 7, 1990. Focusing on positive transaction durations, we have 3534 
observations. In addition, the data have been adjusted by removing the determinis- 
tic component in Eq. (5.32). That is, we employ 3534 positive adjusted durations 
as defined in Eq. (5.31). 

Figure 5.14(a) shows the time plot of the adjusted (positive) durations for the 
first five trading days of November 1990, and Figure 5.15(a) gives the sample ACF 
of the series. There exist some serial correlations in the adjusted durations. We fit 
a WACD(1,1) model to the data and obtain the model 


xi = Wie, Wi = 0.169 + 0.064x;_1 + 0.8850i_1, (5.43) 


where {e;} is a sequence of independent and identically distributed random variates 
that follow the standardized Weibull distribution with parameter @ = 0.879(0.012), 
where 0.012 is the estimated standard error. Standard errors of the estimates in 
Eq. (5.43) are 0.039, 0.010, and 0.018, respectively. All ¢ ratios of the estimates 
are greater than 4.2, indicating that the estimates are significant at the 1% level. 
Figure 5.14(b) shows the time plot of €; = xi/Wis and Figure 5.15(b) provides the 
sample ACF of €;. The Ljung—Box statistics show Q(10) = 4.96 and Q(20) = 
10.75 for the €; series. Clearly, the standardized innovations have no significant 
serial correlations. In fact, the sample autocorrelations of the squared series {é?} 
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Figure 5.15 Sample autocorrelation function of adjusted durations for IBM stock traded in first five 
trading days of November 1990: (a) adjusted series and (b) normalized innovations for WACD(1,1) 
model. 


are also small with Q(10) = 6.20 and Q(20) = 11.16, further confirming lack of 
serial dependence in the normalized innovations. In addition, the mean and standard 
deviation of a standardized Weibull distribution with œ = 0.879 are 1.00 and 1.14, 
respectively. These numbers are close to the sample mean and standard deviation 
of {€;}, which are 1.01 and 1.22, respectively. The fitted model seems adequate. 

In model (5.43), the estimated coefficients show y, + @, © 0.949, indicating 
certain persistence in the adjusted durations. The expected adjusted duration is 
0.169/(1 — 0.064 — 0.885) = 3.31 seconds, which is close to the sample mean 3.29 
of the adjusted durations. The estimated œ of the standardized Weibull distribution 
is 0.879, which is less than but close to 1. Thus, the conditional hazard function is 
monotonously decreasing at a slow rate. 

If a generalized gamma distribution function is used for the innovations, then 
the fitted GACD(1,1) model is 


xi = Wie, Wi = 0.141 + 0.063x;_1 + 0.897Vi_1, (5.44) 


where {e;} follows a standardized, generalized gamma distribution in Eq. (5.57) 
with parameters x = 4.248(1.046) and a = 0.395(0.053), where the number in 
parentheses denotes estimated standard error. Standard errors of the three parame- 
ters in Eq. (5.44) are 0.041, 0.010, and 0.019, respectively. All of the estimates are 
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statistically significant at the 1% level. Again, the normalized innovational process 
{€;} and its squared series have no significant serial correlation, where ê; = x;/ Wi 
based on model (5.44). Specifically, for the €; process, we have Q(10) = 4.95 and 
Q(20) = 10.28. For the ê? series, we have Q(10) = 6.36 and Q(20) = 10.89. 

The expected duration of model (5.44) is 3.52, which is slightly greater than 
that of the WACD(1,1) model in Eq. (5.43). Similarly, the persistence parameter 
1 + ôı of model (5.44) is also slightly higher at 0.96. 


Remark. Estimation of EACD models can be carried out by using programs 
for ARCH models with some minor modification; see Engle and Russell (1998). In 
this book, we use either the RATS program or some Fortran programs developed 
by the author to estimate the duration models. Limited experience indicates that it 
is harder to estimate a GACD model than an EACD or a WACD model. RATS 
programs used to estimate WACD and GACD models are given in Appendix C. 


5.6 NONLINEAR DURATION MODELS 


Nonlinear features are also commonly found in high-frequency data. As an illus- 
tration, we apply some nonlinearity tests discussed in Chapter 4 to the normalized 
innovations €; of the WACD(1,1) model for the IBM transaction durations in 
Example 5.4; see Eq. (5.43). Based on an AR(4) model, the test results are given 
in part (a) of Table 5.8. As expected from the model diagnostics of Example 5.4, 
the Ori-F test indicates no quadratic nonlinearity in the normalized innovations. 
However, the TAR-F test statistics suggest strong nonlinearity. 

Based on the test results in Table 5.8, we entertain a threshold duration model 
with two regimes for the IBM intraday durations. The threshold variable is x; 
(i.e., lag-1 adjusted duration). The estimated threshold value is 3.79. The fitted 
threshold WACD(1,1) model is x; = Wje;, where 


[0.020 + 0.257x;-1 + 0.847 i-1, ci ~ w(0.901) if xi-1 < 3.79, 
~ | 1.808 + 0.027x;-1 + 0.501 W;_1, ci ~ w(0.845) if xj) > 3.79, 
(5.45) 


Wi 


where w(œ) denotes a standardized Weibull distribution with parameter a. The 
number of observations in the two regimes are 2503 and 1030, respectively. In Eq. 
(5.45), the standard errors of the parameters for the first regime are 0.043, 0.041, 
0.024, and 0.014, whereas those for the second regime are 0.526, 0.020, 0.147, and 
0.020, respectively. 

Consider the normalized innovations ê; = x; / Wi of the threshold WACD(1,1) 
model in Eq. (5.45). We obtain Q(12) = 9.8 and Q(24) = 23.9 for ê; and Q(12) = 
8.0 and Q(24) = 16.7 for e. Thus, there are no significant serial correlations in 
the ê; and é? series. Furthermore, applying the same nonlinearity tests as before 
to this newly normalized innovational series €;, we detect no nonlinearity; see part 
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TABLE 5.8 Nonlinearity Tests for IBM Transaction Durations from November 1 to 
November 7, 1990° 


(a) Normalized Innovations of a WACD(1,1) Model 


Type Ori-F TAR-F(1) TAR-F(2) TAR-F(3) TAR-F(4) 

Test 0.343 3.288 3.142 3.128 0.297 

p Value 0.969 0.006 0.008 0.008 0.915 
(b) Normalized Innovations of a Threshold WACD(1,1) Model 

Test 0.163 0.746 1.899 1.752 0.270 

p Value 0.998 0.589 0.091 0.119 0.929 


“Only intraday durations are used. The number in parentheses of TAR-F tests denotes time delay. 


(b) of Table 5.8. Consequently, the two-regime threshold WACD(1,1) model in Eq. 
(5.45) is adequate. 

If we classify the two regimes as heavy and thin trading periods, then the thresh- 
old model suggests that the trading dynamics measured by intraday transaction 
durations are different between heavy and thin trading periods for IBM stock even 
after the adjustment of diurnal pattern. This is not surprising as market activities 
are often driven by the arrival of news and other information. 

The estimated threshold WACD(1,1) model in Eq. (5.45) contains some insignif- 
icant parameters. We refine the model and obtain the result: 


ie 0.225x;_, + 0.867W;_-1, e; ~ w(0.902) if x;—1 < 3.79, 

' | 1.618 +0.614W;—1, ci ~ w(0.846) if xj-1 > 3.79. 
All of the estimates of the refined model are highly significant. The Ljung—Box 
statistics of the standardized innovations ê; = x;/W; show Q(10) =5.91(0.82) 
and Q(20) = 16.04(0.71) and those of é? give Q(10) = 5.35(0.87) and Q(20) = 
15.20(0.76), where the number in parentheses is the p value. Therefore, the refined 
model is adequate. The RATS program used to estimate the prior model is given 
in Appendix C. 


5.7 BIVARIATE MODELS FOR PRICE CHANGE AND DURATION 


In this section, we introduce a model that considers jointly the process of price 
change and the associated duration. As mentioned before, many intraday transac- 
tions of a stock result in no price change. Those transactions are highly relevant 
to trading intensity, but they do not contain direct information on price movement. 
Therefore, to simplify the complexity involved in modeling price change, we focus 
on transactions that result in a price change and consider a price change and dura- 
tion (PCD) model to describe the multivariate dynamics of price change and the 
associated time duration. 
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We continue to use the same notation as before, but the definition is changed to 
transactions with a price change. Let t; be the calendar time of the ith price change 
of an asset. As before, t; is measured in seconds from midnight of a trading day. Let 
P, be the transaction price when the ith price change occurred and At; = ti — ti—1 
be the time duration between price changes. In addition, let N; be the number of 
trades in the time interval (t;—1, t;) that result in no price change. This new variable 
is used to represent trading intensity during a period of no price change. Finally, 
let D; be the direction of the ith price change with D; = 1 when price goes up 
and D; = —1 when the price comes down, and let S; be the size of the ith price 
change measured in ticks. Under the new definitions, the price of a stock evolves 
over time by 


P; = Fri + Di Si, (5.46) 


and the transactions data consist of {At;, N;, Di, Si} for the ith price change. The 
PCD model is concerned with the joint analysis of (At;, Ni, Di, Si). 


Remark. Focusing on transactions associated with a price change can reduce 
the sample size dramatically. For example, consider the intraday data of IBM stock 
from November 1, 1990 to January 31, 1991. There were 60,265 intraday trades, 
but only 19,022 of them resulted in a price change. In addition, there is no diurnal 
pattern in time durations between price changes. 


To illustrate the relationship among the price movements of all transactions 
and those of transactions associated with a price change, we consider the intraday 
tradings of IBM stock on November 21, 1990. There were 726 transactions on that 
day during normal trading hours, but only 195 trades resulted in a price change. 
Figure 5.16 shows the time plot of the price series for both cases. As expected, the 
price series are the same. 

The PCD model decomposes the joint distribution of (At;, N;, Di, Si) given 
Fi—ı as 


f (Ati, Ni, Di, Si| Fi-1) 


= f(S)|Dj, Ni, Ati, Fi-t) f DiINi, Ati, Fi-) f (Ni AG. Fi-) f (AG | Fi-1). 
(5.47) 


This partition enables us to specify suitable econometric models for the condi- 
tional distributions and, hence, to simplify the modeling task. There are many 
ways to specify models for the conditional distributions. A proper specification 
might depend on the asset under study. Here we employ the specifications used by 
McCulloch and Tsay (2000), who use generalized linear models for the discrete- 
valued variables and a time series model for the continuous variable In(Az;). 

For the time duration between price changes, we use the model 


In(Aq;) = Bo + Bi In(Ag_1) + B2Si-1 + o€;, (5.48) 
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Figure 5.16 Time plots of intraday transaction prices of IBM stock on November 21, 1990: (a) all 
transactions and (b) transactions that resulted in price change. 


where o is a positive number and {€;} is a sequence of iid N (0, 1) random variables. 
This is a multiple linear regression model with lagged variables. Other explanatory 
variables can be added if necessary. The log transformation is used to ensure the 
positiveness of time duration. 

The conditional model for N; is further partitioned into two parts because empir- 
ical data suggest a concentration of N; at 0. The first part of the model for N; is 
the logit model 


P(N: = O[AG;, Fi-1) = logitlæo + a In(At;)], (5.49) 


where logit(x) = exp(x)/[1 + exp(x)], whereas the second part of the model is 


In(At; 
Vitae tis, jo e A 


= (5.50) 
1 + exp[y + yı In(Az)] 


where ~ means “is distributed as,” and g(A) denotes a geometric distribution with 
parameter à, which is in the interval (0, 1). 
The model for direction D; is 


D;\(N;, Ati, Fi-1) = sign(ui + o7€), (5.51) 
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where e is a N(O, 1) random variable, and 
Hi = w + w1 Di—-1 + @2 ln(At;), 


4 
In(o;) = $ [9 Di-;| = B|Di-1 + Di-2 + Di-3 + Di—4l. 
j=l 


In other words, D; is governed by the sign of a normal random variable with mean 
Hi and variance a, A special characteristic of the prior model is the function 
for In(o;). For intraday transactions, a key feature is the price reversal between 
consecutive price changes. This feature is modeled by the dependence of D; on 
Di—ı in the mean equation with a negative œ; parameter. However, there exists 
an occasional local trend in the price movement. The previous variance equation 
allows for such a local trend by increasing the uncertainty in the direction of price 
movement when the past data showed evidence of a local trend. For a normal 
distribution with a fixed mean, increasing its variance makes a random draw have 
the same chance to be positive and negative. This in turn increases the chance for 
a sequence of all positive or all negative draws. Such a sequence produces a local 
trend in price movement. 

To allow for different dynamics between positive and negative price movements, 
we use different models for the size of a price change. Specifically, we have 


Si|(Di = —1, Ni, Ati, Fi-1) ~ pQai) +1, with (5.52) 
In(agi) = Nao + naa Ni + na, (Ati) + a,3Si-1 

S;|(Di = 1, Ni, Ati, Fit) ~ pQaui) +1, with (5.53) 
InQaui) = Nu, + Nu, Ni + Nu,2 (Ati) + Nu3Si-1, 


where p(A) denotes a Poisson distribution with parameter A, and 1 is added to the 
size because the minimum size is | tick when there is a price change. 

The specified models in Eqs. (5.48)—(5.53) can be estimated jointly by either 
the maximum-likelihood method or the Markov chain Monte Carlo methods. Based 
on Eq. (5.47), the models consist of six conditional models that can be estimated 
separately. 


Example 5.5. Consider the intraday transactions of IBM stock on November 
21, 1990. There are 194 price changes within normal trading hours. Figure 5.17 
shows the histograms of ln(At;), N;, Dj, and S;. The data for D; are about equally 
distributed between “upward” and “downward” movements. Only a few transac- 
tions resulted in a price change of more than | tick; as a matter of fact, there were 
7 changes with 2 ticks and 1 change with 3 ticks. Using Markov chain Monte Carlo 
(MCMC) methods (see Chapter 12), we obtained the following models for the data. 
The reported estimates and their standard deviations are the posterior means and 
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Figure 5.17 Histograms of intraday transactions data for IBM stock on November 21, 1990: (a) log 
durations between price changes, (b) direction of price movement, (c) size of price change measured in 
ticks, and (d) number of trades without price change. 


standard deviations of MCMC draws with 9500 iterations. The model for the time 
duration between price changes is 


In(At;) = 4.023 + 0.032 In(Az;_1) — 0.025S;—1 + 1.403¢;, 


where standard deviations of the coefficients are 0.415, 0.073, 0.384, and 0.073, 
respectively. The fitted model indicates that there was no dynamic dependence in 
the time duration. For the N; variable, we have 


Pr(N; > O|At;, F;-1) = logit[—0.637 + 1.740 In(Ag;)], 


where standard deviations of the estimates are 0.238 and 0.248, respectively. Thus, 
as expected, the number of trades with no price change in the time interval (t;—1, ti) 
depends positively on the length of the interval. The magnitude of N; when it is 
positive is 


exp[0.178 — 0.910 In(At;)] 
Niet tri jetted: 2227 oe. 
ieee ee 1 + expl0.178 — 0.910 n (A5 )] 


where standard deviations of the estimates are 0.246 and 0.138, respectively. The 
negative and significant coefficient of In(At;) means that N; is positively related 
to the length of the duration Af; because a large ln(At;) implies a small A;, which 
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in turn implies higher probabilities for larger N;; see the geometric distribution in 
Eq. (5.27). 
The fitted model for D; is 
ui = 0.049 — 0.840D;_; — 0.004 In(Az,), 


In(o;) = 0.244|D;-1 + Dj-2 + Dj-3 + Dj_al, 


where standard deviations of the parameters in the mean equation are 0.129, 0.132, 
and 0.082, respectively, whereas the standard error for the parameter in the variance 
equation is 0.182. The price reversal is clearly shown by the highly significant 
negative coefficient of D;_;. The marginally significant parameter in the variance 
equation is exactly as expected. Finally, the fitted models for the size of a price 
change are 


In(agi) = 1.024 — 0.327N; + 0.412 In(Ati) — 4.4745)_1, 
InQaui) = —3.683 — 1.542N; + 0.419 In(At;) + 0.921 Si—1, 


where standard deviations of the parameters for the “down size” are 3.350, 0.319, 
0.599, and 3.188, respectively, whereas those for the “up size” are 1.734, 0.976, 
0.453, and 1.459. The interesting estimates of the prior two equations are the 
negative estimates of the coefficient of N;. A large N; means there were more 
transactions in the time interval (t;—1, t;) with no price change. This can be taken 
as evidence of no new information available in the time interval (t;_1, t;). Conse- 
quently, the size for the price change at t; should be small. A small 4,,; or Ag; for 
a Poisson distribution gives precisely that. 

In summary, granted that a sample of 194 observations in a given day may not 
contain sufficient information about the trading dynamics of IBM stock, but the 
fitted models appear to provide some sensible results. McCulloch and Tsay (2000) 
extend the PCD model to a hierarchical framework to handle all the data of the 
63 trading days between November 1, 1990, and January 31, 1991. Many of the 
parameter estimates become significant in this extended sample, which has more 
than 19,000 observations. For example, the overall estimate of the coefficient of 
In(Az;_1) in the model for time duration ranges from 0.04 to 0.1, which is small, 
but significant. 

Finally, using transactions data to test microstructure theory often requires a 
careful specification of the variables used. It also requires a deep understanding of 
the way by which the market operates and the data are collected. However, ideas of 
the econometric models discussed in this chapter are useful and widely applicable 
in analysis of high-frequency data. 


5.8 APPLICATION 


In this section we apply the ACD model to stock volatility modeling. Consider the 
daily range of the log price of Apple stock from January 4, 1999, to November 20, 
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2007. The data are obtained from Yahoo Finance and consist of 2235 observations. 
This series was analyzed in Tsay (2009). The range of daily log prices has been 
used in the literature as a robust alternative to volatility modeling; see Chapter 3 
and Chou (2005) and the references therein. Apple stock had two-for-one splits on 
June 21, 2000, and February 28, 2005, during the sample period, but no adjustments 
are needed for the splits because we use daily range of log price. As mentioned 
before, stock prices in the U.S. markets switched from the tick size of a dollar 
to the decimal system on January 29, 2001. Such a change affected the bid—ask 
spread of stock prices. We shall employ intervention analysis to study the impact 
of such a policy change on the stock volatility. 

The sample mean, standard deviation, minimum, and maximum of the range 
of log prices are 0.0407, 0.0218, 0.0068, and 0.1468, respectively. The sample 
skewness and excess kurtosis are 1.3 and 2.13, respectively. Figure 5.18(a) shows 
the time plot of the range series. The volatility seems to be increasing from 2000 
to 2001, then decreasing to a stable level after 2002. It seems to increase somewhat 
at the end of the series. Figure 5.19(a) shows the sample ACF of the daily range 
series. The sample ACFs are highly significant and decay slowly. 

We fit EACD(1,1), WACD(1,1), and GACD(1,1) models to the daily range 
series. The estimation results, along with the Ljung—Box statistics for the standard- 
ized residual series and its squared process, are given in Table 5.9. The parameter 
estimates for the duration equation are stable for all three models, except for the 
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Figure 5.18 Time plots of daily range of log price of Apple stock from January 4, 1999, to November 
20, 2007: (a) observed daily range and (b) standardized residuals of a GACD(1,1) model. 


272 HIGH-FREQUENCY DATA ANALYSIS AND MARKET MICROSTRUCTURE 
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Figure 5.19 Sample autocorrelation function of daily range of log prices of Apple stock from January 
4, 1999, to November 20, 2007: (a) ACF of daily range and (b) ACF of standardized residual series of 
GACD(1,1) model. 


TABLE 5.9 Estimation Results of EACD(1,1), WACD(1,1), and GACD(1,1) Models 
for Daily Range of Log Prices of Apple Stock from January 4, 1999 to November 20, 
20077 


Parameters Checking 
Model ag ay By a K Q(10) Q* (10) 
EACD 0.0007 0.133 0.849 16.65 12.12 
(0.0005) (0.036) (0.044) (0.082) (0.277) 
WACD 0.0013 0.131 0.835 2.377 13.66 9.74 
(0.0003) (0.015) (0.021) (0.031) (0.189) (0.464) 
GACD 0.0010 0.133 0.843 1.622 2.104 14.62 11.21 


(0.0002) (0.015) (0.019) (0.029) (0.040) (0.147) (0.341) 


“The standard errors of the estimates and the p values of the Ljung—Box statistics are in parentheses, 
where Q(10) and Q*(10) are for standardized residual series and its squared process, respectively. 


constant term of the EACD model, which appears to be statistically insignificant 
at the usual 5% level. Indeed, in this particular instance, the EACD(1,1) model 
fares slightly worse than the other two ACD models. Between the WACD(1,1) and 
GACD(1,1) models, we slightly prefer the GACD(1,1) model because it fits the 
data better and is more flexible. 
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Figure 5.19(b) shows the sample ACFs of the standardized residuals of the 
fitted GACD(1,1) model. From the plot, the standardized residuals do not have 
significant serial correlations, even though the lag-1 sample ACF is slightly above 
its two standard error limit. The lag-1 serial correlation is removed when we use 
nonlinear ACD models later. Figure 5.18(b) shows the time plot of the standardized 
residuals of the GACD(1,1) model. The residuals do not show any pattern of 
model inadequacy. The mean, standard deviation, minimum, and maximum of the 
standardized residuals are 0.203, 4.497, 0.999, and 0.436, respectively. 

It is interesting to see that the estimates of the shape parameter œ are greater 
than 1 for both WACD(,1) and GACD(1,1) models, indicating that the hazard 
function of the daily range is monotonously increasing. This is consistent with the 
idea of volatility clustering, for large volatility tends to be followed by another 
large volatility. 


Threshold ACD model 

To refine the GACD(1,1) model for the daily range of log prices of Apple stock, 
we employ a two-regime threshold WACD(1,1) model. Some preliminary analysis 
of the threshold WACD models indicates that the major difference in the parameter 
estimates between the two regimes is the shape parameter of the Weibull distribu- 
tion. Thus, we focus on a TWACD(2;1,1) model with different shape parameters 
for the two regimes. 

Table 5.10 gives the maximized log-likelihood value of a TWACD(2;1,1) model 
with delay d = 1 and threshold r € {x(g)|¢ = 60, 65, ..., 95}, where xq) denotes 
the sample gth percentile. From the table, the threshold 0.04753 is selected, which 
is the 70th percentile of the data. The fitted model is 


x= Wie Wi = 0.0013 + 0.1539x;_1 + 0.8131 ;_1, 


where the standard errors of the coefficients are 0.0003, 0.0164, and 0.0215, respec- 
tively, and e; follows the standardized Weibull distribution as 


. W (2.2756) if x;_; < 0.04753, 
' W (2.7119) otherwise, 


where the standard errors of the two shape parameters are 0.0394 and 0.0717, 
respectively. 


TABLE 5.10 Selection of Threshold of TWACD(2;1,1) Model for Daily Range of 
Log Prices of Apple Stock from January 4, 1999, to November 20, 2007° 


Quantile 60 65 70 75 80 85 90 95 
r x 100 4.03 4.37 4.75 5.15 5.58 6.16 7.07 8.47 
£(r) x 10° 6.073 6.076 6.079 6.076 6.078 6.074 6.072 6.066 


“The threshold variable is x;—1. 
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Figure 5.20 Model fitting for daily range of log price of Apple stock from January 4, 1999, to 
November 20, 2007: (a) conditional expected durations of fitted TWACD(2;1,1) model and (b) sample 
ACF of standardized residuals. 


Figure 5.20(a) shows the time plot of the conditional expected duration for 
the fitted TWACD(2;1,1) model, that is, Wi whereas Figure 5.20(b) gives the 
residual ACFs for the fitted model. All residual ACFs are within the two stan- 
dard error limits. Indeed, we have Q(1) = 4.01(0.05) and Q(10) = 9.84(0.45) for 
the standardized residuals and Q*(1) = 0.83(0.36) and Q*(10) = 9.35(0.50) for 
the squared series of the standardized residuals, where the number in parentheses 
denotes p value. Note that the threshold variable x;_; is also selected based on the 
value of the log-likelihood function. For instance, the log-likelihood function of 
the TWACD(2;1,1) model assumes the value 6.069 x 10° and 6.070 x 103, respec- 
tively, for d = 2 and 3 when the threshold is 0.04753. These values are lower than 
that when d = 1. 


Intervention Analysis 

High-frequency financial data are often influenced by external events, for example, 
an increase or drop in interest rates by the U.S. Federal Open Market Committee 
or a jump in the oil price. Applications of ACD models in finance are often faced 
with the problem of outside interventions. To handle the effects of external events, 
the intervention analysis of Box and Tiao (1975) can be used. Here we apply the 
analysis to the daily range series of Apple stock to study the impact of change in 
tick size on the stock volatility. 
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Let tọ be the time of intervention. For the Apple stock, tọ = 522, which cor- 
responds to January 26, 2001, the last trading day before the change in tick size. 
Since more observations in the sample are after the intervention, we define the 
indicator variable 


jo) = 1 if i tj 
i “10. otherwise, 


to signify the absence of intervention. Since a larger tick size tends to increase the 
observed daily price range, it is reasonable to assume that the conditional expected 
range would be higher before the intervention. A simple intervention model for the 
daily range of Apple stock is then given by 


€u if x;-1 < 0.04753, 
Xi = Wi i 
€2; otherwise, 


where y; follows the model 
Wi = ao + yI + aixi- + BiWi-, (5.54) 


where y denotes the decrease in expected duration due to the decimalization of 
stock prices. In other words, the expected durations before and after the intervention 
are 
ao +y ao 

— and ———, 

l= = fi T= > pi 
respectively. We expect y > 0. 

The fitted duration equation for the intervention model is 


Wi = 0.0021 + 0.001179” + 0.1595x;_1 + 0.7828y;_1, 


where the standard errors of the estimates are 0.0004, 0.0003, 0.0177, and 0.0264, 
respectively. The estimate y is significant at the 1% level. For the innovations, we 
have 


- W (2.2835) if x;—ı < 0.04753, 
Ci ~} W(2.7322) otherwise. 

The standard errors of the two estimates of the shape parameter are 0.0413 and 
0.0780, respectively. Figure 5.21(a) shows the expected durations of the inter- 
vention model, and Figure 5.21(b) shows the ACF of the standardized residuals. 
All residual ACFs are within the two standard error limits. Indeed, for the stan- 
dardized residuals, we have Q(1) = 2.37(0.12) and Q(10) = 6.24(0.79). For 
the squared series of the standardized residuals, we have Q*(1) = 0.34(0.56) and 
Q* (10) = 6.79(0.75). As expected, 7 > 0 so that the decimalization indeed reduces 
the expected value of the daily range. This simple analysis shows that, as expected, 
adopting the decimal system reduces the volatility of Apple stock. 
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Figure 5.21 Model fitting for daily range of log price of Apple stock from January 4, 1999, to 
November 20, 2007: (a) conditional expected durations of fitted TWACD(2;1,1) model with intervention 
and (b) sample ACF of corresponding standardized residuals. 


APPENDIX A: REVIEW OF SOME PROBABILITY DISTRIBUTIONS 


Exponential Distribution 
A random variable X has an exponential distribution with parameter 6 > 0 if its 
probability density function (pdf) is given by 


—e/B ifx>0, 
f(lB) = 4B 
0 


otherwise. 


Denoting such a distribution by X ~ exp(6), we have E(X) = f and Var(X) = 
B?. The cumulative distribution function (CDF) of X is 


0 if x < 0, 
POW) i t ap 2505, 


When $ = 1, X is said to have a standard exponential distribution. 
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Gamma Function 
For «K > 0, the gamma function T (x) is defined by 


[0.6] 
T(K) =} ale dx. 
0 
The most important properties of the gamma function are: 


1. For any «x > 1, r (k) = (k — I) (k — 1). 
2. For any positive integer m, I (m) = (m — 1)!. 


3. T) = VT. 


The integration 
7 z 
row= f xle™ dx, y>0 
0 


is an incomplete gamma function. Its values have been tabulated in the literature. 
Computer programs are now available to evaluate the incomplete gamma function. 


Gamma Distribution 
A random variable X has a gamma distribution with parameter « and £ («K >0, 
B > 0) if its pdf is given by 


1 
f(xlk, B) = 4 PTC) 
0 


xk leX/B if x > 0, 


otherwise. 


By changing variable y = x/f, one can easily obtain the moments of X: 


my = m = 1 = k+m—1 ,—x/B 
E(X =f x fapad = | x e dx 


p” F yor dy = p”T (k F m) . 
FCK) Jo r (K) 


In particular, the mean and variance of X are E(X) = «f and Var(X) = xB. When 
B = 1, the distribution is called a standard gamma distribution with parameter «x. 
We use the notation G ~ gamma(«) to denote that G follows a standard gamma 
distribution with parameter «. The moments of G are 


E(G") = es m>0. (5.55) 
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Weibull Distribution 
A random variable X has a Weibull distribution with parameters œ and $ (a> 0, 
B > 0) if its pdf is given by 


u Ne HO if x > 0, 
f(xla, B) = 7 


if x <0, 


where 6 and g are the scale and shape parameters of the distribution. The mean 
and variance of X are 


(3). wereld) PCr] 
E(X)= pr {1+—), Var(X) = B°4r{1+—)—-|r(1+- 
a a a 


and the CDF of X is 


F E 0 if x< 0, 
(x|a, B) = 1—e G/B" fx > 0. 


When œ = 1, the Weibull distribution reduces to an exponential distribution. 
Define Y = X/[6T(1+ 1/a)]. We have E(Y) = 1 and the pdf of Y is 


INI 1 i 
afr(i+-)| ytesp{—[r (1+) 9| | if y > 0, 
fOlaæ) = a a 
0 otherwise, 


(5.56) 


where the scale parameter 6 disappears due to standardization. The CDF of the 
standardized Weibull distribution is 


0 if y <0, 


F(yla) = 1—exp|-[r(1+=)>] | if y>0, 
a 


and we have E(Y) = 1 and Var(Y) = [(1+ 2/a@)/[Td + 1/a)]? — 1. For a dura- 
tion model with Weibull innovations, the pdf in Eq. (5.56) is used in the maximum- 
likelihood estimation. 


Generalized Gamma Distribution 


A random variable X has a generalized gamma distribution with parameter a, f, K 
(a >0, £ >0, and x > 0) if its pdf is given by 


axx! x \* if i 
f(xla, b, k) = peroo “XP | — Z if x > 0, 


0 otherwise, 
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where £ is a scale parameter, and œ and « are shape parameters. This distribution 
can be written as 


where G is a standard gamma random variable with parameter «. The pdf of X 
can be obtained from that of G by the technique of changing variables. Similarly, 
the moments of X can be obtained from that of G in Eq. (5.55) by 


E(X") = E[(BGY*)"] = p” E(G™*) = p" = a aca 


When «x = 1, the generalized gamma distribution reduces to that of a Weibull 
distribution. Thus, the exponential and Weibull distributions are special cases of 
the generalized gamma distribution. 

The expectation of a generalized gamma distribution is E(X) = ST (K + 1/a)/ 
I(x). In duration models, we need a distribution with unit expectation. Therefore, 
defining a random variable Y = AX/6, where à =T (x)/T (k + 1/a), we have 
E(Y) = 1 and the pdf of Y is 


kæ— l 


ay YN : 
fOla, K) = 4 xero) exp |- I] oP ae (5.57) 
0 otherwise, 


where again the scale parameter 6 disappears and A = T (k)/T (k + 1/æ). 
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A useful concept in modeling duration is the hazard function implied by a dis- 
tribution function. For a random variable X, the survival function is defined as 


S(x) = P(X > x) =1-— P(X < x) = 1 — CDF(x), x>0, 


which gives the probability that a subject, which follows the distribution of X, 
survives at the time x. The hazard function (or intensity function) of X is then 
defined by 


f(x) 


A(x) = Six)’ 


(5.58) 


where f(-) and S(-) are the pdf and survival function of X, respectively. 
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Example 5.6. For the Weibull distribution with parameters œ and £, the sur- 
vival function and hazard function are 


x\* a 
S(x|a, B) = exp [- (5) | : h(xla, B) = —x*!, x>0. 
B pe 
In particular, when œ = 1, we have h(x|B) = 1/8. Therefore, for an exponential 
distribution, the hazard function is constant. For a Weibull distribution, the haz- 
ard is a monotone function. If œ > 1, then the hazard function is monotonously 
increasing. If œ < 1, the hazard function is monotonously decreasing. For the gen- 
eralized gamma distribution, the survival function and, hence, the hazard function 
involve the incomplete gamma function. Yet the hazard function may exhibit vari- 
ous patterns, including U shape or inverted U shape. Thus, the generalized gamma 
distribution provides a flexible approach to modeling the duration of stock trans- 
actions. 
For the standardized Weibull distribution, the survival and hazard functions are 


sow sol -[e(1-2)T 


1 a 
hole) =ar (1+ 3 yo y>0. 
a 


APPENDIX C: SOME RATS PROGRAMS FOR DURATION MODELS 


The data used are adjusted time durations of intraday transactions of IBM stock 
from November 1 to November 9, 1990. The file name is ibm1to5.txt and it has 
3534 observations. 


Program for Estimating a WACD(1,1) Model 


all 0 3534:1 

open data ibmito5.txt 

data(org=obs) / x r1 

set psi = 1.0 

nonlin a0 al b1 al 

frml gvar = a0+al*x(t-1)+b1*psi(t-1) 

frml gma = SLNGAMMA(1.0+1.0/al) 

frml gln =al*gma(t)+log(al)-log(x(t)) $ 
tal*log(x(t)/(psi(t)=gvar(t)))-(exp(gma(t))*x(t)/psi(t))**al 

smpl 2 3534 

compute a0 = 0.2, al = 0.1, bl = 0.1, al = 

maximize (method=bhhh, recursive, iterations=1 

set fv = gvar(t) 

set resid = x(t)/fv(t) 

set residsgq = resid(t) *resid(t) 


0 
5 
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cor (qstats,number=20,span=10) resid 
cor (qstats,number=20,span=10) residsq 


Program for Estimating a GACD(1,1) Model 


all 0 3534:1 

open data ibmito5.txt 
data(org=obs) / x rl 

set psi = 1.0 

nonlin a0 al b1 al ka 

frml cv = a0+al*x(t-1)+b1*psi (t-1) 
frml gma = %LNGAMMA (ka) 


frml lam = exp(gma(t) ) /exp(SLNGAMMA (ka+(1.0/al) )) 


frml xlam = x(t) /(lam(t) * (psi(t)=cv(t))) 
frml gln =-gma(t)+log 
(xlam(t))**al 


smpl 2 3534 


( 
( 


al/x(t))+ka*al*log(xlam(t) ) 
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compute a0 = 0.238, al = 0.075, b1 = 0.857, 4.0 

nlpar (criterion=value,cvcrit=0.00001) 

maximize (method=bhhh, recursive, iterations=150) 

set fv = cv(t) 

set resid = x(t)/fv(t) 

set residsgq = resid(t) *resid(t) 

cor (qstats,number=20,span=10) resid 

cor (qstats,number=20,span=10) residsq 

Program for Estimating a TAR-WACD(1,1) Model 

The threshold 3.79 is prespecified. 

all 0 3534:1 

open data ibmilto5.txt 

data(org=obs) / x rt 

set psi = 1.0 

nonlin al a2 al b0 b2 bl 

frml u = ((x(t-1)-3.79) /abs(x(t-1)-3.79)+1.0)/2.0 

frml cpl = al*x(t-1)+a2*psi(t-1) 

frml gmal = %LNGAMMA(1.0+1.0/al1) 

frml cp2 = b0+b2*psi(t-1) 

frml gma2 = %LNGAMMA(1.0+1.0/b1) 

frml cp = cpl(t)*(1-u(t))+cep2 (t) *u(t) 

frml glni =al*gmal(t)+log(al)-log(x(t)) $ 
+al*log(x(t)/(psi(t)=cp(t)))-(exp(gmal(t))*x(t)/psi(t))**al 
frml gln2 =bl*gma2(t)+log(bl)-log(x(t)) $ 
+bl*log(x(t)/(psi(t)=cp(t) ))- (exp (gma2 (t)) *x(t)/psi(t))**bl 
frml gln = glni(t)*(1-u(t))+gln2(t) *u(t) 

smpl 2 3534 


compute al = 0.2, a2 = 0.85, al = 0.9 
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compute b0 = 1.8, b2 = 0.5, bl = 0.8 

maximize (method=bhhh, recursive,iterations=150) gln 
set fv = cp(t) 

set resid = x(t)/fv(t) 

set residsgq = resid(t) *resid(t) 


cor (qstats,number=20,span=10) resid 
cor (qstats,number=20,span=10) residsq 


EXERCISES 


5.1. 


3:2. 


5.3; 


5.4. 


Let r, be the log return of an asset at time t. Assume that {r;} is a Gaussian 
white noise series with mean 0.05 and variance 1.5. Suppose that the proba- 
bility of a trade at each time point is 40% and is independent of r;. Denote 
the observed return by r’. Is r? serially correlated? If yes, calculate the first 
three lags of autocorrelations of r°. 


Let P, be the observed market price of an asset, which is related to the fun- 
damental value of the asset P* via Eq. (5.9). Assume that A P% = P* — P* , 
forms a Gaussian white noise series with mean zero and variance 1.0. Sup- 
pose that the bid—ask spread is two ticks. What is the lag-1 autocorrela- 
tion of the price change series AP; = P, — P;—-; when the tick size is $4? 
What is the lag-1 autocorrelation of the price change when the tick size 
is $75? 

The file ibm-d2-dur.txt contains the adjusted durations between trades of 
IBM stock on November 2, 1990. The file has three columns consisting of 
day, time of trade measured in seconds from midnight, and adjusted durations. 


(a) Build an EACD model for the adjusted duration and check the fitted 
model. 


(b) Build a WACD model for the adjusted duration and check the fitted 
model. 


(c) Build a GACD model for the adjusted duration and check the fitted 
model. 


(d) Compare the prior three duration models. 


The file mmm9912-dtp.txt contains the transactions data of the stock of 3M 
Company in December 1999. There are three columns: day of the month, time 
of transaction in seconds from midnight, and transaction price. Transactions 
that occurred after 4:00 pm Eastern time are excluded. 


(a) Is there a diurnal pattern in 3M stock trading? You may construct a time 
series n;, which denotes the number of trades in a 5-minute time interval 
to answer this question. 


(b) Use the price series to confirm the existence of a bid—ask bounce in 
intraday trading of 3M stock. 
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SD 


5.6. 


Se. 


5.8. 


(c) Tabulate the frequencies of price change in multiples of tick size $i. 
You may combine changes with 5 ticks or more into a category and those 
with —5 ticks or beyond into another category. 


Consider again the transactions data of 3M stock in December 1999. 


(a) Use the data to construct an intraday 5-minute log return series. Use the 
simple average of all transaction prices within a 5-minute interval as the 
stock price for the interval. Is the series serially correlated? You may use 
Ljung—Box statistics to test the hypothesis with the first 10 lags of the 
sample autocorrelation function. 


(b 


w? 


There are seventy-seven 5-minute returns in a normal trading day. Some 
researchers suggest that the sum of squares of the intraday 5-minute 
returns can be used as a measure of daily volatility. Apply this approach 
and calculate the daily volatility of the log return of 3M stock in Decem- 
ber 1999. Discuss the validity of such a procedure to estimate daily 
volatility. 

The file mmm9912-adur.txt contains an adjusted intraday trading duration 
of 3M stock in December 1999. There are thirty-nine 10-minute time intervals 
in a trading day. Let d; be the average of all log durations for the ith 10- 
minute interval across all trading days in December 1999. Define an adjusted 
duration as t;/exp(d;), where j is in the ith 10-minute interval. Note that 
more sophisticated methods can be used to adjust the diurnal pattern of 
trading duration. Here we simply use a local average. 


(a) Is there a diurnal pattern in the adjusted duration series? Why? 

(b) Build a duration model for the adjusted series using exponential innova- 
tions. Check the fitted model. 

(c) Build a duration model for the adjusted series using Weibull innovations. 
Check the fitted model. 

(d) Build a duration model for the adjusted series using generalized gamma 
innovations. Check the fitted model. 

(e) Compare and comment on the three duration models built before. 

To gain experience in analyzing high-frequency financial data, consider the 

trade data of Boeing stock from December 1 to December 5, 2008. The data 

are in five files: taq-td-ba12012008.txt to taq-td-bal2052008.txt. 

Each file has five columns, namely hour, minute, second, price, and vol- 

ume. Only transactions within the normal trading hours (9:30 AM to 4:00 PM 

Eastern time) are kept. Construct a time series of the number of trades in an 

intraday 5-minute time interval. Is there any diurnal pattern in the constructed 

series? You can simply compute the sample ACF of the series to answer this 

question. 


Again, consider the high-frequency data of Boeing stock from December 1 
to December 5, 2008. Construct an intraday 5-minute return series. Note that 
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the price of the stock in a 5-minute interval (e.g., 9:30 to 9:35 AM) is the last 
transaction price within the time interval. For simplicity, ignore overnight 
returns. Are there serial correlations in the 5-minute return series? Use 10 
lags of the ACF and 5% significance level to perform of test. 

5.9. Consider the same problem as in Exercise 5.8, but use 10-minute time inter- 
vals. 


5.10. Again, consider the high-frequency data of Boeing stock. Compute the per- 
centage of consecutive transactions without price change in the sample. 
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CHAPTER 6 


Continuous-Time Models 
and Their Applications 


The price of a financial asset evolves over time and forms a stochastic process, 
which is a statistical term used to describe the evolution of a random variable over 
time. The observed prices are a realization of the underlying stochastic process. The 
theory of stochastic process is the basis on which the observed prices are analyzed 
and statistical inference is made. 

There are two types of stochastic process for modeling the price of an asset. The 
first type is called the discrete-time stochastic process, in which the price changes at 
discrete time points. All the processes discussed in the previous chapters belong to 
this category. For example, the daily closing price of IBM stock on the New York 
Stock Exchange forms a discrete-time stochastic process. Here the price changes 
only at the closing of a trading day. Price movements within a trading day are 
not necessarily relevant to the observed daily price. The second type of stochastic 
process is the continuous-time process, in which the price changes continuously, 
even though the price is only observed at discrete time points. One can think of 
the price as the “true value” of the stock that always exists and is time varying. 

For both types of process, the price can be continuous or discrete. A continuous 
price can assume any positive real number, whereas a discrete price can only 
assume a countable number of possible values. Assume that the price of an asset is 
a continuous-time stochastic process. If the price is a continuous random variable, 
then we have a continuous-time continuous process. If the price itself is discrete, 
then we have a continuous-time discrete process. Similar classifications apply to 
discrete-time processes. The series of price change in Chapter 5 is an example of 
a discrete-time discrete process. 

In this chapter, we treat the price of an asset as a continuous-time continuous 
stochastic process. Our goal is to introduce the statistical theory and tools needed 
to model financial assets and to price options. We begin the chapter with some 
terminologies of stock options used in the chapter. In Section 6.2, we provide a brief 
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introduction of Brownian motion, which is also known as a Wiener process. We 
then discuss some diffusion equations and stochastic calculus, including the well- 
known Ito lemma. Most option pricing formulas are derived under the assumption 
that the price of an asset follows a diffusion equation. We use the Black—Scholes 
formula to demonstrate the derivation. Finally, to handle the price variations caused 
by rare events (e.g., a profit warning), we also study some simple diffusion models 
with jumps. 

If the price of an asset follows a diffusion equation, then the price of an option 
contingent to the asset can be derived by using hedging methods. However, with 
jumps the market becomes incomplete and there is no perfect hedging of options. 
The price of an option is then valued either by using diversifiability of jump risk 
or defining a notion of risk and choosing a price and a hedge that minimize this 
risk. For basic applications of stochastic processes in derivative pricing, see Cox 
and Rubinstein (1985) and Hull (2007). 


6.1 OPTIONS 


A stock option is a financial contract that gives the holder the right to trade a certain 
number of shares of a specified common stock by a certain date for a specified 
price. There are two types of options. A call option gives the holder the right to 
buy the underlying stock; see Chapter 3 for a formal definition. A put option gives 
the holder the right to sell the underlying stock. The specified price in the contract 
is called the strike price or exercise price. The date in the contract is known as the 
expiration date or maturity. American options can be exercised at any time up to 
the expiration date. European options can be exercised only on the expiration date. 

The value of a stock option depends on the value of the underlying stock. Let 
K be the strike price and P be the stock price. A call option is in-the-money when 
P > K, at-the-money when P = K, and out-of-the-money when P < K. A put 
option is in-the-money when P < K, at-the-money when P = K, and out-of-the- 
money when P > K. In general, an option is in-the-money when it would lead to 
a positive cash flow to the holder if it were exercised immediately. An option is 
out-of-the-money when it would lead to a negative cash flow to the holder if it 
were exercised immediately. Finally, an option is at-the-money when it would lead 
to zero cash flow if it were exercised immediately. Obviously, only in-the-money 
options are exercised in practice. For more information on options, see Hull (2007). 


6.2 SOME CONTINUOUS-TIME STOCHASTIC PROCESSES 


In mathematical statistics, a continuous-time continuous stochastic process is 
defined on a probability space (Q, F, P), where Q is a nonempty space, F is ao 
field consisting of subsets of 9, and P is a probability measure; see Chapter 1 of 
Billingsley (1986). The process can be written as {x(7, t)}, where t denotes time 
and is continuous in [0, co). For a given t, x(7,t) is a real-valued continuous 
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random variable (i.e., a mapping from Q to the real line), and 7 is an element of 
Q. For the price of an asset at time t, the range of x(n, t) is the set of nonnegative 
real numbers. For a given 7, {x(n,t)} is a time series with values depending on 
the time f. For simplicity, we write a continuous-time stochastic process as {x;} 
with the understanding that, for a given t, x, is a random variable. In the literature, 
some authors use x(t) instead of x; to emphasize that t£ is continuous. However, 
we use the same notation x+, but call it a continuous-time stochastic process. 


6.2.1 Wiener Process 


In a discrete-time econometric model, we assume that the shocks form a white 
noise process, which is not predictable. What is the counterpart of shocks in a 
continuous-time model? The answer is the increments of a Wiener process, which 
is also known as a standard Brownian motion. There are many ways to define a 
Wiener process {w,}. We use a simple approach that focuses on the small change 
Aw; = Wr+At — Wy associated with a small increment Afr in time. A continuous- 
time stochastic process {w;} is a Wiener process if it satisfies 


1. Aw, = € At, where e is a standard normal random variable; and 
2. Aw, is independent of w; for all j < t. 


The second condition is a Markov property saying that conditional on the present 
value w;, any past information of the process, w; with j < t, is irrelevant to the 
future w,+4¢ with £ > 0. From this property, it is easily seen that for any two nonover- 
lapping time intervals A, and Aj, the increments w,,4a, — W4 and w+, — Wr 
are independent. In finance, this Markov property is related to a weak form of 
efficient market. 

From the first condition, Aw; is normally distributed with mean zero and vari- 
ance At. That is, Aw; ~ N(O, At), where ~ denotes probability distribution. 
Consider next the process w,. We assume that the process starts at t = 0 with 
initial value wo, which is fixed and often set to zero. Then w; — wo can be treated 
as a sum of many small increments. More specifically, define T = t/ At, where At 
is a small positive increment. Then 


T T 
w; — Wo = Wrar— wo = >. Aw; =) evar, 


i=l i=l 


where Aw; = Wiat — Wa-—1)ar. Because the e; are independent, we have 


F 
E(w; — wo) = 0, Var(w, — wo) = » At=T At=t. 
i=l 


Thus, the increment in w, from time O to time ¢ is normally distributed with 
mean zero and variance t. To put it formally, for a Wiener process w;, we have 
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Figure 6.1 Four simulated Wiener processes. 


that w; — wọ ~ N (0, t). This says that the variance of a Wiener process increases 
linearly with the length of time interval. 

Figure 6.1 shows four simulated Wiener processes on the unit time interval [0, 1]. 
They are obtained by using a simple version of Donsker’s theorem in the statistical 
literature with n = 3000; see Donsker (1951) or Billingsley (1968). The four plots 
start with wo = 0 but drift apart as time increases, illustrating that the variance of 
a Wiener process increases with time. A simple time transformation from [0, 1) to 
[0, oo) can be used to obtain simulated Wiener processes for t € [0, 00). 


Donsker’s Theorem 

Assume that {z;}/_, is a sequence of independent standard normal random variates. 
For any t € [0, 1], let [nt] be the integer part of nt. Define wn, = (1/./7) ae Zi 
Then wn, converges in distribution to a Wiener process w; on [0, 1] as n goes to 
infinity. 


R or S-Plus Commands for Generating a Wiener Process 


n = 3000 

epsi = rnorm(n,0,1) 
w=cumsum(epsi) /sqrt (n) 
plot (w, type='1') 


Remark. A formal definition of a Brownian motion w; on a probability space 
(Q, F, P) is that it is a real-valued, continuous stochastic process for t > 0 with 
independent and stationary increments. In other words, w; satisfies the following: 
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1. Continuity: The map from ¢ to w; is continuous almost surely with respect 
to the probability measure P. 


2. Independent increments: If s < t, w, — ws is independent of w, for all v < s. 


3. Stationary increments: If s < t, w, — ws and w;—s — wo have the same prob- 
ability distribution. 


It can be shown that the probability distribution of the increment w; — ws is nor- 
mal with mean u(t — s) and variance o*(t — s). Furthermore, for any given time 
indexes 0 <t) < f2 <--- < tk, the random vector (W4, Wn, .-.--, Wg) follows a 
multivariate normal distribution. Finally, a Brownian motion is standard if wo = 0 
almost surely, u = 0, and o7=1. 


Remark. An important property of Brownian motions is that their paths are 
not differentiable almost surely. In other words, for a standard Brownian motion 
wr, it can be shown that dw;/dt does not exist for all elements of Q except for 
elements in a subset Qı C Q such that P(Q,) = 0. As a result, we cannot use 
the usual integration in calculus to handle integrals involving a standard Brownian 
motion when we consider the value of an asset over time. Another approach must be 
sought. This is the purpose of discussing Ito’s calculus in the next section. 


6.2.2 Generalized Wiener Process 


The Wiener process is a special stochastic process with zero drift and variance 
proportional to the length of the time interval. This means that the rate of change 
in expectation is zero and the rate of change in variance is 1. In practice, the mean 
and variance of a stochastic process can evolve over time in a more complicated 
manner. Hence, further generalization of a stochastic process is needed. To this 
end, we consider the generalized Wiener process in which the expectation has a 
drift rate u and the rate of variance change is 07. Denote such a process by x; and 
use the notation dy for a small change in the variable y. Then the model for x; is 


dx, = udt+odu,, (6.1) 


where w, is a Wiener process. If we consider a discretized version of Eq. (6.1), 
then 


Xt — X0 = ut toeVst 
for increment from 0 to t. Consequently, 
E(x; — Xo) = pt, Var(x; — xo) = ot. 


The results say that the increment in x, has a growth rate of jz for the expectation 
and a growth rate of o? for the variance. In the literature, u and o of Eq. (6.1) 
are referred to as the drift and volatility parameters of the generalized Wiener 
process x;. 
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6.2.3 Ito Process 


The drift and volatility parameters of a generalized Wiener process are time invari- 
ant. If one further extends the model by allowing u and ø to be functions of the 
stochastic process x;, then we have an Ito process. Specifically, a process x, is an 
Ito process if it satisfies 


dx, = u(x, t) dt + o (x,t) dwr, (6.2) 


where w, is a Wiener process. This process plays an important role in mathematical 
finance and can be written as 


t t 
x =a0+ | ls s)ds-+ | o (xs, 8) dus, 
0 0 


where xo denotes the starting value of the process at time 0 and the last term on the 
right-hand side is a stochastic integral. Equation (6.2) is referred to as a stochastic 
diffusion equation with u(x;, t) and o (x+, t) being the drift and diffusion functions, 
respectively. 

The Wiener process is a special Ito process because it satisfies Eq. (6.2) with 
M(x;,t) = 0 and o(x;, t) = 1. 


6.3 ITO’S LEMMA 


In finance, when using continuous-time models, it is common to assume that the 
price of an asset is an Ito process. Therefore, to derive the price of a financial 
derivative, one needs to use Ito’s calculus. In this section, we briefly review Ito’s 
lemma by treating it as a natural extension of the differentiation in calculus. Ito’s 
lemma is the basis of stochastic calculus. 


6.3.1 Review of Differentiation 


Let G(x) be a differentiable function of x. Using the Taylor expansion, we have 


= 6 10°G re 1 3G ‘ 


Taking the limit as Ax — 0 and ignoring the higher order terms of Ax, we have 


aG dG 10°G 
AG = —Ax+—Ay+-= 


1 3G 
Ox dy 2 ax? rg 


a’G 
et Pans 2 oy? 


ay sa (Ay) +: 
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Taking the limit as Ax — 0 and Ay —> 0, we have 


aG dG 
dG = —dx + —dy. 
ox oy 


6.3.2 Stochastic Differentiation 


Turn next to the case in which G is a differentiable function of x; and t, and x; is 
an Ito process. The Taylor expansion becomes 


NG a Ket LOG aA y+ oe Ax At + LY Aa 
= — — --— (Ax x -— ree, 
ð ðt 2 dx? ot 2 ar? 
(6.3) 
A discretized version of the Ito process is 
Ax = u At+oevAt, (6.4) 


where, for simplicity, we omit the arguments of u and o, and Ax = X;4.a; — Xr. 
From Eq. (6.4), we have 


(Ax)? = P(A? + 07? At + 2uce(At)?/* = oe? At + H(At), (6.5) 


where H(Ar) denotes higher order terms of At. This result shows that (Ax)? 
contains a term of order Ar, which cannot be ignored when we take the limit as 
At — 0. However, the first term on the right-hand side of Eq. (6.5) has some nice 
properties: 


E(o*e? At) = 0° At, 
Var(o7e* At) = Elote*(At)*] — [E(o7e? At)? = 204(At)’, 
where we use E(e*) = 3 for a standard normal random variable. These two prop- 


erties show that o?e? At converges to a nonstochastic quantity o? At as At > 0. 
Consequently, from Eq. (6.5), we have 


(Ax)? —>o?°dt as At > 0, 


Plugging the prior result into Eq. (6.3) and using Ito’s equation of x; in Eq. (6.2), 
we obtain 


ac aG 132G 
ia jrr S are 
o= ta t 
_ (2G 26,186 
=a OE 8 Oa 


dt 


dG 
o?) dt + —o dw,, 
Ox 


which is the well-known Ito lemma in stochastic calculus. 

Recall that we suppressed the argument (x+, t) from the drift and volatility terms 
u and o in the derivation of Ito’s lemma. To avoid any possible confusion in the 
future, we restate the lemma as follows. 
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Ito’s Lemma 
Assume that x; is a continuous-time stochastic process satisfying 


dx, = W(x, t) dt + 0 (xr, t) dwr, 


where w; is a Wiener process. Furthermore, G (x+, t) is a differentiable function of 
x, and t. Then, 


aG aG 18G 
dael ua pes 
[Euo + at ae 


3G 
0261| dt + gl Co D dwr (6.6) 
x 


Example 6.1. As a simple illustration, consider the square function G (wz, t) = 
w? of the Wiener process. Here we have u(w,, t) = 0, o(u;, t) = 1, and 


dG dG VG 
Aa = 2w, Ae , = 2. 
du, ot dw? 
Therefore, 
dw? = (2w, x 0+0 + ; x 2x 1)dt+2w,dw; = dt + 2w, dw. (6.7) 


6.3.3 An Application 


Let P, be the price of a stock at time t, which is continuous in [0, oo). In the 
literature, it is common to assume that P, follows the special Ito process 


dP, = uP, dt + o P, dw,, (6.8) 


where u and ø are constant. Using the notation of the general Ito process in 
Eq. (6.2), we have u(x, t) = ux, and o (x+, t) = ox;, where x, = P,. Such a spe- 
cial process is referred to as a geometric Brownian motion. We now apply Ito’s 
lemma to obtain a continuous-time model for the logarithm of the stock price P,. 
Let G(P;, t) = In(P;) be the log price of the underlying stock. Then we have 


iG IG 183G 1-1 
3P, P? at 20P2 2 Pp? 


Consequently, via Ito’s lemma, we obtain 


1 1-1 3 9 1 
d In(P;) = pelt a pe? P; dt+ oP dws 
t t if 


ae 
= pe dt + o dw;. 


This result shows that the logarithm of a price follows a generalized Wiener process 
with drift rate u — 07/2 and variance rate ø? if the price is a geometric Brownian 
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motion. Consequently, the change in logarithm of price (i.e., log return) between 
current time ¢ and some future time T is normally distributed with mean (u — 
o? /2)(T — t) and variance o?(T — t). If the time interval T — t = A is fixed and 
we are interested in equally spaced increments in log price, then the increment 
series is a Gaussian process with mean (u — 07/2) A and variance o° A. 


6.3.4 Estimation of u and o 


The two unknown parameters u and ø of the geometric Brownian motion in 
Eq. (6.8) can be estimated empirically. Assume that we have n + 1 observations of 
stock price P, at equally spaced time interval A (e.g., daily, weekly, or monthly). 
We measure A in years. Denote the observed prices as {Pọo, Pi, ..., Pa} and let 
ri = ln(P,) — ln(P;—1) for t = 1,...,n. 

Since P; = P;_; exp(7;), 7; is the continuously compounded return in the tth time 
interval. Using the result of the previous section and assuming that the stock price 
P, follows a geometric Brownian motion, we obtain that r; is normally distributed 
with mean (u — 07/2) A and variance g? A. In addition, the r; are not serially 
correlated. 

For simplicity, define u, = E(r;,) = (u — o? /2) A and a” = var(r;) = 0° A. 
Let r and s, be the sample mean and standard deviation of the data—that is, 


As mentioned in Chapter 1, r and s, are consistent estimates of the mean and 
standard deviation of r;, respectively. That is, r > u, and s, > o, as n —> Oo. 
Therefore, we may estimate o by 


Furthermore, it can be shown that the standard error of this estimate is approxi- 
mately ¢//2n. From (4, = r, we can estimate u by 


When the series 7; is serially correlated or when the price of the asset does not 
follow the geometric Brownian motion in Eq. (6.8), then other estimation methods 
must be used to estimate the drift and volatility parameters of the diffusion equation. 
We return to this issue later. 


Example 6.2. Consider the daily log returns of IBM stock in 1998. 
Figure 6.2(a) shows the time plot of the data, which have 252 observations. 
Figure 6.2(b) shows the sample autocorrelations of the series. It is seen that 
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Figure 6.2 Daily returns of IBM stock in 1998: (a) log returns and (b) sample autocorrelations. 


the log returns are indeed serially uncorrelated. The Ljung-Box statistic 
gives Q(10) = 4.9, which is highly insignificant compared with a chi-squared 
distribution with 10 degrees of freedom. 

If we assume that the price of IBM stock in 1998 follows the geometric Brownian 
motion in Eq. (6.8), then we can use the daily log returns to estimate the parameters 
u and o. From the data, we have r = 0.002276 and s, = 0.01915. Since 1 trading 
day is equivalent to A = 1/252 year, we obtain that 

5-300, A= Ë + Ê = 0.6198 

o= JA = 0. = = + z Y k 
Thus, the estimated expected return was 61.98% and the standard deviation was 
30.4% per annum for IBM stock in 1998. 

The normality assumption of the daily log returns may not hold, however. In this 
particular instance, the skewness —0.464(0.153) and excess kurtosis 2.396(0.306) 
raise some concern, where the number in parentheses denotes asymptotic standard 
error. 


Example 6.3. Consider the daily log return of the stock of Cisco Systems, Inc. 
in 2007. There are 251 observations, and the sample mean and standard deviation 
are —3.81 x 1075 and 0.0174, respectively. The log return series also shows no 
serial correlation with Q (12) = 12.30 with a p value of 0.42. Therefore, we have 


T E tsi &* _ 0.0094 
o= = =U. 5 = Ms * 
JA J1.0/251.0 =A 2 
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Consequently, the estimated expected log return for Cisco Systems’ stock was 
—0.94% per annum, and the estimated standard deviation was 27.5% per annum 
in 2007. 


6.4 DISTRIBUTIONS OF STOCK PRICES AND LOG RETURNS 


The result of the previous section shows that if one assumes that price of a stock 
follows the geometric Brownian motion 


dP, = uP, dt + o P, dw;, 


then the logarithm of the price follows a generalized Wiener process 


Pe 
d \n(P;) = (u — =) dt + o dwr, 


where P, is the price of the stock at time ż and w; is a Wiener process. Therefore, 
the change in log price from time ¢ to T is normally distributed as 


2 
In(Pr) — In(P,) ~ N (n = =) (T —1),0°(T — n| l (6.9) 


Consequently, conditional on the price P, at time f, the log price at time T >t is 
normally distributed as 


2 
In(Pr) ~ N [mce + (u = T) (T =t), o° (T = n| , (6.10) 


Using the result of lognormal distribution discussed in Chapter 1, we obtain the 
(conditional) mean and variance of Pr as 
E(Pr) = P, exp[u(T — t)], 
Var(Pr) = P? exp[2u(T — t){explo?(T — t)] — 1}. 

Note that the expectation confirms that u is the expected rate of return of the stock. 

The prior distribution of stock price can be used to make inferences. For 
example, suppose that the current price of stock A is $50, the expected return of the 
stock is 15% per annum, and the volatility is 40% per annum. Then the expected 
price of stock A in 6 months (0.5 year) and the associated variance are given by 

E(Pr) = 50 exp(0.15 x 0.5) = 53.89, 
Var(Pr) = 2500 exp(0.3 x 0.5)[exp(0.16 x 0.5) — 1] = 241.92. 


The standard deviation of the price 6 months from now is v241.92 = 15.55. 
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Next, let r be the continuously compounded rate of return per annum from time 
t to T. Then we have 


Pr = P, exp[r(T — t)], 


where T and ¢f are measured in years. Therefore, 


1 Pr 
r= In{ — |]. 
T—t P, 
By Eq. (6.9), we have 


el ee Ar X(T 
(z) vne- 3) eo. -o). 


Consequently, the distribution of the continuously compounded rate of return per 


annum is 
n o? o? 
r~ — —, — |. 
“a 2 T-t 


The continuously compounded rate of return is, therefore, normally distributed with 
mean u — o*/2 and standard deviation o/./T =t. 

Consider a stock with an expected rate of return of 15% per annum and a 
volatility of 10% per annum. The distribution of the continuously compounded 
rate of return of the stock over 2 years is normal with mean 0.15 — 0.01/2 = 0.145 
or 14.5% per annum and standard deviation 0.1//2 = 0.071 or 7.1% per annum. 
These results allow us to construct confidence intervals (CI) for r. For instance, 
a 95% CI for r is 0.145+1.96 x 0.071 per annum (i.e., 0.6%, 28.4%). 


6.5 DERIVATION OF BLACK-SCHOLES DIFFERENTIAL EQUATION 


In this section, we use Ito’s lemma and assume no arbitrage to derive the 
Black-Scholes differential equation for the price of a derivative contingent to 
a stock valued at P,. Assume that the price P, follows the geometric Brownian 
motion in Eq. (6.8) and G; = G(P;,t) is the price of a derivative (e.g., a call 
option) contingent on P,. By Ito’s lemma, 


uP, + 


OP, at F 2 aP? 


dG; ( : oP, 
t 


dG 
aP?) dt + Lo P, du. 


The discretized versions of the process and previous result are 


AP, = uP, At + o P, Au;, (6.11) 

AG; GG. IPO aaa AG, 
Meal ae bs p?) at P, Aw, (6.12 
i (Siar ooo e e ea 
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where AP; and AG; are changes in P, and G; in a small time interval At. Because 
Aw, = €VAt for both Eqs. (6.11) and (6.12), one can construct a portfolio of the 
stock and the derivative that does not involve the Wiener process. The appropriate 
portfolio is short on derivative and long 0G;/dP; shares of the stock. Denote the 
value of the portfolio by V;. By construction, 


OG; 


V, = —G, + ——P,. 6.13 
t eF aP, t ( ) 
The change in V, is then 
OG; 
AV, = —AG; + AP; (6.14) 
OP; 


Substituting Eqs. (6.11) and (6.12) into Eq. (6.14), we have 


0G 10G 
AV, = (- Bae ‘0? P2) At. (6.15) 
ot 2 aP? 


This equation does not involve the stochastic component Aw;. Therefore, under 
the no arbitrage assumption, the portfolio V; must be riskless during the small 
time interval Ar. In other words, the assumptions used imply that the portfolio 
must instantaneously earn the same rate of return as other short-term, risk-free 
securities. Otherwise there exists an arbitrage opportunity between the portfolio 
and the short-term, risk-free securities. Consequently, we have 


AV, = rV; At = (rADV,, (6.16) 


where r is the risk-free interest rate. By Eqs. (6.13)—(6.16), we have 


dG; 18G aG 
( -+ ‘a? P?) At=r Q - rP) At. 


ðt | 2 aP2 ðP; 
Therefore, 
0G, aG; 1 , 30° Cy 
P -o° P =rG;. 6.17 
ar +r op, + 2° t JP? rG; ( ) 


This is the Black-Scholes differential equation for derivative pricing. It can be 
solved to obtain the price of a derivative with P, as the underlying variable. The 
solution so obtained depends on the boundary conditions of the derivative. For a 
European call option, the boundary condition is 


Gr = max(Pr — K, 0), 
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where T is the expiration time and K is the strike price. For a European put option, 
the boundary condition becomes 


Gr = max(K — P7, 0). 


Example 6.4. As a simple example, consider a forward contract on a stock 
that pays no dividend. In this case, the value of the contract is given by 


G, = P, — K exp[—r(T — 1)], 
where K is the delivery price, r is the risk-free interest rate, and T is the expiration 
time. For such a function, we have 
3G; 3G, a°G, 


= -rK expl=r (T — 91, =i, =0 
at r exp[ r( )] OP, aP2 


Substituting these quantities into the left-hand side of Eq. (6.17) yields 
—r K exp[—r(T —t)] +r P; =r{P; — K exp[—r(T — t)]}, 


which equals the right-hand side of Eq. (6.17). Thus, the Black—Scholes differential 
equation is indeed satisfied. 


6.6 BLACK-SCHOLES PRICING FORMULAS 


Black and Scholes (1973) successfully solved their differential equation in Eq. 
(6.17) to obtain exact formulas for the price of European call-and-put options. In 
what follows, we derive these formulas using what is called risk-neutral valuation 
in finance. 


6.6.1 Risk-Neutral World 


The drift parameter u drops out from the Black-Scholes differential equation. In 
finance, this means the equation is independent of risk preferences. In other words, 
risk preferences cannot affect the solution of the equation. A nice consequence of 
this property is that one can assume that investors are risk neutral. In a risk-neutral 
world, we have the following results: 

e The expected return on all securities is the risk-free interest rate r. 


e The present value of any cash flow can be obtained by discounting its expected 
value at the risk-free rate. 


6.6.2 Formulas 


The expected value of a European call option at maturity in a risk-neutral world is 


E,,[max(Pr — K, 0)], 
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where E, denotes expected value in a risk-neutral world. The price of the call 
option at time f¢ is 


cı = exp[—r(T — t)]E,[max(Pr — K, 0)]. (6.18) 


Yet in a risk-neutral world, we have u = r, and by Eq. (6.10), In(Pr) is normally 
distributed as 


2 
In(Pr) ~ N [ne + (: = =) (T —1),0°(T — n| : 


Let g(Pr) be the probability density function of Pr. Then the price of the call 
option in Eq. (6.18) is 


cı = exp[—r(T — nf (Pr — K)g(Pr)dPr. 
K 


By changing the variable in the integration and some algebraic calculations (details 
are given in Appendix A), we have 


ct = P,® (h4) — K exp[—r(T — t)]®(h_), (6.19) 


where ®(x) is the cumulative distribution function (CDF) of the standard normal 
random variable evaluated at x, 


In(P,/K) + (r + 07/2)(T — t) 
SS 


oV/T -t 
_ In(P,/K)+(r—07/2(T 1) — 
a o a = oVT Ls 


In practice, B(x) can easily be obtained from most statistical packages. Alterna- 
tively, one can use an approximation given in Appendix B. 

The Black-Scholes call formula in Eq. (6.19) has some nice interpretations. 
First, if we exercise the call option on the expiration date, we receive the stock, 
but we have to pay the strike price. This exchange will take place only when the call 
finishes in-the-money (i.e., Pr > K). The first term P,®(h) is the present value 
of receiving the stock if and only if Pr > K and the second term —K exp[—r(T — 
t)]®(h_) is the present value of paying the strike price if and only if Pr > K. 
A second interpretation is particularly useful. As shown in the derivation of the 
Black-Scholes differential equation in Section 6.5, ® (h4) = 0G;/0P, is the num- 
ber of shares in the portfolio that does not involve uncertainty, the Wiener process. 
This quantity is known as the delta in hedging. We know that c; = P,;®(h,) + B;, 
where B, is the dollar amount invested in risk-free bonds in the portfolio (or 
short on the derivative). We can then see that B, = —K exp[—r(T — t)]®(h_) 
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directly from inspection of the Black—Scholes formula. The first term of the for- 
mula, P;®(h+), is the amount invested in the stock, whereas the second term, 
K exp[—r(T — t)]®(A_), is the amount borrowed. 

Similarly, we can obtain the price of a European put option as 


pi = K exp[—r(T —1)]®(—h_) — P,®(—4). (6.20) 


Since the standard normal distribution is symmetric with respect to its mean 0.0, 
we have ®(x) = | — ®(—x) for all x. Using this property, we have ®(—h;) = 
1 — ®(h;). Thus, the information needed to compute the price of a put option is 
the same as that of a call option. Alternatively, using the symmetry of normal 
distribution, it is easy to verify that 


Pi — Ct = K exp[—r(T —t)] — P,, 


which is referred to as the put—call parity and can be used to obtain p; from c;. The 
put—call parity can also be obtained by considering the following two portfolios: 


1. Portfolio A. One European call option plus an amount of cash equal to 
K exp[—r(T — t)]. 
2. Portfolio B. One European put option plus one share of the underlying stock. 


The payoff of these two portfolios is 
max(Pr, K) 


at the expiration of the options. Since the options can only be exercised at the 
expiration date, the portfolios must have identical value today. This means 


cı + K exp[—r(T — t)] = pi + Pr, 
which is the put—call parity given earlier. 


Example 6.5. Suppose that the current price of Intel stock is $80 per share 
with volatility o = 20% per annum. Suppose further that the risk-free interest rate 
is 8% per annum. What is the price of a European call option on Intel with a strike 
price of $90 that will expire in 3 months? 

From the assumptions, we have P, = 80, K = 90, T — t = 0.25, o = 0.2, and 
r = 0.08. Therefore, 


In(80/90) + (0.08 + 0.04/2) x 0.25 

= ar. 
0.20.25 

h_ = h, — 0.2V0.25 = —1.0278. 


hy 
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Using any statistical software (e.g., R or S-Plus) or the approximation in Appendix 
B, we have 


® (—0.9278) = 0.1767, ®(—1.0278) = 0.1520. 
Consequently, the price of a European call option is 
Cr = $80@(—0.9278) — $90@(—1.0278) exp(—0.02) = $0.73. 


The stock price has to rise by $10.73 for the purchaser of the call option to break 
even. 
Under the same assumptions, the price of a European put option is 


Pr = $90 exp(—0.08 x 0.25) ® (1.0278) — $80 (0.9278) = $8.95. 


Thus, the stock price can rise an additional $1.05 for the purchaser of the put option 
to break even. 


Example 6.6. The strike price of the previous example is well beyond the 
current stock price. A more realistic strike price is $81. Assume that the other 
conditions of the previous example continue to hold. We now have P, = 80, K = 
81, r = 0.08, and T — t = 0.25, and the h; become 


__ In(80/81) + (0.08 + 0.04/2) x 0.25 
7 0.2./0.25 
h_ = h4 — 0.2V0.25 = 0.025775. 


hy = 0.125775, 


Using the approximation in Appendix B, we have ®(0.125775) = 0.5500 and 
® (0.025775) = 0.5103. The price of a European call option is then 


ct = $80 (0.125775) — $81 exp(—0.02) ® (0.025775) = $3.49. 
The price of the stock has to rise by $4.49 for the purchaser of the call option to 
break even. On the other hand, under the same assumptions, the price of a European 


put option is 


Pr = $81 exp(—0.02) @(—0.025775) — $80@(—0.125775) 
= $81 exp(—0.02) x 0.48972 — $80 x 0.44996 = $2.89. 


The stock price must fall $1.89 for the purchaser of the put option to break 
even. 
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6.6.3 Lower Bounds of European Options 


Consider the call option of a nondividend-paying stock. It can be shown that the 
price of a European call option satisfies 


cı = P, — K exp[—r(T — t)]; 


that is, the lower bound for a European call price is P, — K exp[—r(T — t)]. This 
result can be verified by considering two portfolios: 


1. Portfolio A. One European call option plus an amount of cash equal to 
K exp[—r(T — t)]. 

2. Portfolio B. One share of the stock. 
For portfolio A, if the cash is invested at the risk-free interest rate, it will result in 
K at time T. If Pr > K, the call option is exercised at time T and the portfolio is 
worth Pr. If Pr < K, the call option expires worthless and the portfolio is worth 
K. Therefore, the value of portfolio is 

max(Pr, K). 
The value of portfolio B is Pr at time T. Hence, portfolio A is always worth more 
than (or, at least, equal to) portfolio B. It follows that portfolio A must be worth 
more than portfolio B today; that is, 
cı + K exp[—r(T —1t)] > P, or c > P,— K exp[—r(T —1)]. 

Furthermore, since c; > 0, we have 


ct > max(P; — K exp[—r(T — t)], 0). 


A similar approach can be used to show that the price of a corresponding 
European put option satisfies 


p: = max{K exp[—r(T — t)] — Py, 0}. 


Example 6.7. Suppose that P, = $30, K = $28, r = 6% per annum, and T — 
t = 0.5. In this case, 


P, — K exp[—r(T — t)] = $[30 — 28 exp(—0.06 x 0.5)] ~ $2.83. 


Assume that the European call price of the stock is $2.50, which is less than the 
theoretical minimum of $2.83. An arbitrageur can buy the call option and short the 
stock. This provides a new cash flow of $(30 — 2.50) = $27.50. If invested for 6 
months at the risk-free interest rate, the $27.50 grows to $27.50 exp(0.06 x 0.5) = 
$28.34. At the expiration time, if Pr > $28, the arbitrageur exercises the option, 
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closes out the short position, and makes a profit of $(28.34 — 28) = $0.34. On 
the other hand, if Pr < $28, the stock is bought in the market to close the short 


position. The arbitrageur then makes an even greater profit. For illustration, suppose 
that Pr = $27.00, then the profit is $(28.34 — 27.00) = $1.34. 


6.6.4 Discussion 


From the formulas, the price of a call or put option depends on five 
variables—namely, the current stock price P;, the strike price K, the time to 
expiration T — t measured in years, the volatility o per annum, and the interest 
rate r per annum. It pays to study the effects of these five variables on the price 
of an option. 


Marginal Effects 

Consider first the marginal effects of the five variables on the price of a call option 
c;. By marginal effects we mean changing one variable while holding the others 
fixed. The effects on a call option can be summarized as follows: 


1. Current Stock Price P;. c, is positively related to In(P;). In particular, c, > 0 
as P, > 0 and cœ > œ as P, —> ov. Figure 6.3(a) illustrates the effects with 
K = 80, r = 6% per annum, T — t = 0.25 year, and o = 30% per annum. 
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Figure 6.3 Marginal effects of current stock price on price of an option with K = 80, T — t = 0.25, 
o = 0.3, and r = 0.06: (a) call option and (b) put option. 
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2. Strike Price K. c, is negatively related to In(K). In particular, c; —> P, as 


K — Oandc; ~ 0 as K > œ. 


3. Time to Expiration. c, is related to T — t in a complicated manner, but we 
can obtain the limiting results by writing h, and h_ as 


h 
+ ONT —t 


ONT —t 


_ In(P,/K) i m +07/2)./T =t 


o 


_ n(P/K) C o2/2)/T T 


o 


If P, < K, then c ~ 0 as (T — t) — 0. If P, >K, then c; ~ P,— K as 
(T — t) —> Oandc; —> P, as (T — t) > oo. Figure 6.4(a) shows the marginal 
effects of T — t on c; for three different current stock prices. The fixed 
variables are K = 80, r = 6%, and o = 30%. The solid, dotted, and dashed 
lines of the plot are for P, = 70, 80, and 90, respectively. 


4. Volatility o. Rewriting h+ and h_ as 


_ InP /K)+r(T-1) g 


hy 
ONT -t 
N 
- 
pe 
a 
r 
Za 
ail 
rl 
7 Oo 
ee - 
te) Z 
t 7 
a 
Pa 
7 
i © 
£ 
= yt 5 
8 ž a 
Cea a oO 
5T 52 
co) fo) 
2 2 
Qj © 
> > 
<+ 
te) 
N 
2 o 


00 02 04 06 08 10 
Time to expiration 


(a) 


T =t, 


02 04 06 08 1.0 
Time to expiration 


(b) 


Figure 6.4 Marginal effects of time to expiration on price of an option with K = 80, ø = 0.3, and 
r = 0.06: (a) call option and (b) put option. Solid, dotted, and dashed lines are for current stock price 


P, = 70, 80, and 90, respectively. 
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_ m(P,/K)+r(T- o 
7 o/T —t 2 


we obtain that (a) if In(P,/K)+r(T — t) < 0, then c; > 0 as ø — 0, and 
(b) if In(P,/K)+r(T — t) > 0, then c, > P, — Ke~"7-9 as o — 0 and 
cı > P, as o — oo. Figure 6.5(a) shows the effects of o on c; for K = 80, 
T —t = 0.25, r = 0.06, and three different values of P,. The solid, dotted, 
and dashed lines are for P, = 70, 80, and 90, respectively. 


i =t, 


5. Interest Rate. c; is positively related to r such that c; > P, as r > ov. 


The marginal effects of the five variables on a put option can be obtained 
similarly. Figures 6.3(b), 6.4(b), and 6.5(b) illustrates the effects for some selected 
cases. 


Some Joint Effects 

Figure 6.6 shows the joint effects of volatility and strike price on a call option, 
where the other variables are fixed at P, = 80, r = 0.06, and T — t = 0.25. As 
expected, the price of a call option is higher when the volatility is high and the 
strike price is well below the current stock price. Figure 6.7 shows the effects on 
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Figure 6.5 Marginal effects of stock volatility on price of an option with K = 80, T — t = 0.25, and 
r = 0.06: (a) call option and (b) put option. Solid, dotted, and dashed lines are for current stock price 
P, = 70, 80, and 90, respectively. 
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Figure 6.6 Joint effects of stock volatility and strike price on call option with P, = 80, r = 0.06, and 
T — t = 0.25. 
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Figure 6.7 Joint effects of stock volatility and strike price on put option with K = 80, T — t = 0.25, 
and r = 0.06. 
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a put option under the same conditions. The price of a put option is higher when 
the volatility is high and the strike price is well above the current stock price. 
Furthermore, the plot also shows that the effects of a strike price on the price of a 
put option becomes more linear as the volatility increases. 


6.7 EXTENSION OF ITO’S LEMMA 


In derivative pricing, a derivative may be contingent on multiple securities. When 
the prices of these securities are driven by multiple factors, the price of the deriva- 
tive is a function of several stochastic processes. The two-factor model for the term 
structure of interest rate is an example of two stochastic processes. In this section, 
we briefly discuss the extension of Ito’s lemma to the case of several stochastic 
processes. 

Consider a k-dimensional continuous-time process x; = (x1;,..., Xx)’, where k 
is a positive integer and x;,; is a continuous-time stochastic process satisfying 


dxi = pix) dt +o;(x;)dwi,, i=l,...,k, (6.21) 


where wir is a Wiener process. It is understood that the drift and volatility functions 
Li (x7) and o;(x;;) are functions of time index ¢ as well. We omit ¢ from their 
arguments to simplify the notation. For i # j, the Wiener processes w;; and wj; 
are different. We assume that the correlation between dw;; and dw; is pij. This 
means that p;; is the correlation between the two standard normal random variables 
€; and €; defined by Aw;; = €; At and Aw;; = €; At. Assume that G; = G(x;, t) 
is a function of the stochastic processes x;; and time t. The Taylor expansion gives 


k 
3G, 3G, 1 3G, 
AG, = Ax; ee Axi A 
' 2 a, ae TF pee ae, 7 
Ly & L Axi At + 
2 L axo 


The discretized version of Eq. (6.21) is 
AWir = Mi(x,) At + o; (x;) A Wir, i=l,...,k. 
Using a similar argument as that of Eq. (6.5) in Section 6.3, we can obtain that 


Jim (Axi)? > o? (x;) dt, (6.22) 
to 


jinn (Axir Ax jr) => Oj (x;)o; (X1+) pij dt. (6.23) 
t> 
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Using Eqs. (6.21)—(6.23), taking the limit as At — 0, and ignoring higher order 
terms of At, we have 


k 


OG; OG; 1 32G; 
u= Fa 5 i j ij dt 
2 OXit ks at ia 2 2 3 Dan drp (41) Œ) pij 
Pee: 
+ Do gp OH) dwi (6.24) 
i=l 


This is a generalization of Ito’s lemma to the case of multiple stochastic processes. 


6.8 STOCHASTIC INTEGRAL 


We briefly discuss stochastic integration so that the price of an asset can be obtained 
under the assumption that it follows an Ito process. We deduce the integration result 
using Ito’s formula. For a rigorous treatment on the topic, readers may consult 
textbooks on stochastic calculus. First, like the usual integration of a deterministic 
function, integration is the opposite of differentiation so that 


t 
f dx, = Xt — Xo 
0 


continues to hold for a stochastic process x;. In particular, for the Wiener pro- 
cess w;, we have i dws = w; because wo = 0. Next, consider the integration 
i w, dws. Using the prior result and taking integration of Eq. (6.7), we have 


t 
wp = 142 f Ws dws. 
0 


Therefore, 


t 1 2 
1 Ws dws = = (w7 — t). 
0 2 


This is different from the usual deterministic integration for which i, ydy =( y? = 
2 
yo)/2. 
Turn to the case that x, is a geometric Brownian motion—that is, x; satisfies 


dx, = uxi dt + ox; dwr, 


where u and ø are constant with o > 0; see Eq. (6.8). Applying Ito’s lemma to 
G(x, t) = In(x;), we obtain 


Ge 
d ln(x) = (u — T) dt + o dw. 
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Performing the integration and using the results obtained before, we have 


t o2 t t 
f d |n(x;) = (u — =) / ds +o f dus. 
0 2 J Jo 0 


In(x;) = In(xo) + (u — 07 /2)t + ow, 


Consequently, 


and 
xı = Xo exp| (u — o7/2)t +oau,;]. 


Changing the notation x; to P, for the price of an asset, we have a solution for the 
price under the assumption that it is a geometric Brownian motion. The price is 


P, = Py exp[(u — 07/2)t + ou]. (6.25) 


6.9 JUMP DIFFUSION MODELS 


Empirical studies have found that the stochastic diffusion model based on Brownian 
motion fails to explain some characteristics of asset returns and the prices of their 
derivatives [e.g., the “volatility smile” of implied volatilities; see Bakshi, Cao, and 
Chen (1997) and the references therein]. Volatility smile is referred to as the convex 
function between the implied volatility and strike price of an option. Both out-of- 
the-money and in-the-money options tend to have higher implied volatilities than 
at-the-money options especially in the foreign exchange markets. Volatility smile 
is less pronounced for equity options. The inadequacy of the standard stochastic 
diffusion model has led to the developments of alternative continuous-time models. 
For example, jump diffusion and stochastic volatility models have been proposed 
in the literature to overcome the inadequacy; see Merton (1976) and Duffie (1995). 

Jumps in stock prices are often assumed to follow a probability law. For example, 
the jumps may follow a Poisson process, which is a continuous-time discrete pro- 
cess. For a given time f, let X, be the number of times a special event occurs 
during the time period [0, tf]. Then X; is a Poisson process if 


msm 


Pr(X; =m) = exp(—Atr), rA>0. 


m! 

That is, X, follows a Poisson distribution with parameter Ar. The parameter À 

governs the occurrence of the special event and is referred to as the rate or intensity 

of the process. A formal definition also requires that X; be a right-continuous 
homogeneous Markov process with left-hand limit. 

In this section, we discuss a simple jump diffusion model proposed by Kou 

(2002). This simple model enjoys several nice properties. The returns implied 
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by the model are leptokurtic and asymmetric with respect to zero. In addition, 
the model can reproduce volatility smile and provide analytical formulas for the 
prices of many options. The model consists of two parts, with the first part being 
continuous and following a geometric Brownian motion and the second part being 
a jump process. The occurrences of jump are governed by a Poisson process, and 
the jump size follows a double exponential distribution. Let P, be the price of an 
asset at time t. The simple jump diffusion model postulates that the price follows 
the stochastic differential equation 


dP x 
= = nat roan +4) Yo = o|. (6.26) 


i i=1 


where w; is a Wiener process, n, is a Poisson process with rate 4, and {J;} is a 
sequence of independent and identically distributed nonnegative random variables 
such that X = In(J) has a double exponential distribution with probability density 
function 


1 oy 
fx(x) = ane =l. O<n<l. (6.27) 
n 


The double exponential distribution is also referred to as the Laplacian distribution. 
In model (6.26), n;, w;, and J; are independent so that there is no relation between 
the randomness of the model. Notice that n; is the number of jumps in the time 
interval [0,ż] and follows a Poisson distribution with parameter àt, where À is a 
constant. At the ith jump, the proportion of price jump is J; — 1. 

The double exponential distribution can be written as 


Pees | € with probability 0.5, (6.28) 


—é with probability 0.5, 


where & is an exponential random variable with mean ņ and variance n*. The 
probability density function of & is 


1_ 
f(x) = e, O<x<o@. 
n 


Some useful properties of the double exponential distribution are 


eX 


E(X) =k, Var(X) = 27°, E(e*) = a 
=H 


7: 


For finite samples, it is hard to distinguish a double exponential distribution from a 
Student-t distribution. However, a double exponential distribution is more tractable 
analytically and can generate a higher probability concentration (e.g., higher peak) 
around its mean value. As stated in Chapter 1, histograms of observed asset returns 
tend to have a higher peak than the normal density. Figure 6.8 shows the probability 
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Figure 6.8 Probability density functions of double exponential and normal random variable with 
mean zero and variance 0.0008. Solid line denotes the double exponential distribution. Dotted line is 


the normal distribution. 


density function of a double exponential random variable in the solid line and that 
of a normal random variable in the dotted line. Both variables have mean zero and 
variance 0.0008. The high peak of the double exponential density is clearly seen. 

Solving the stochastic differential equation in Eq. (6.26), we obtain the dynamics 
of the asset price as 


jo" = 
P, = Po exp] ( z )rrow|[] 4 (6.29) 


i=1 


where it is understood that me ı = 1. This result is a generalization of Eq. (6.25) 
by including the stochastic jumps. It can be obtained as follows. Let t; be the time 
of the ith jump. For t € [0, t1), there is no jump and the price is given in Eq. (6.25). 
Consequently, the left-hand price limit at time żı is 


P,- = Po expl(u — o°/2)t + own). 
At time t,, the proportion of price jump is J; — | so that the price becomes 
Pa = (1+ Ji — W)P,- = JP,- = Po expl(u — oa? /2)ti tow, |W. 
For t € (t1, t2), there is no jump in the interval (tı, t] so that 


P, = P, expl(u — o7/2)(t — t1) + o (w, — w, )]. 
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Plugging in P;,, we have 

P, = Po exp[(u — 0? /Dt tow] Ji. 
Repeating the scheme, we obtain Eq. (6.29). 


From Eq. (6.29), the simple return of the underlying asset in a small time 
increment At becomes 


P jae P 1 Nt+At 
pee (u- 507) UREA a —1, 
i=nt 


where it is understood that a summation over an empty set is zero and X; = ln(J;). 
For a small At, we may use the approximation e* ~ 1 + x + x?/2 and the result 
(Aw)? ~ At discussed in Section 6.3 to obtain 


Pog a2 l3 — ae ; 
Zr (u- 50 ) At+o Aw, + 5 Xi t50 (Aw) 
i=n,+1 
Nt+At 
x u At+oeVAt + 5 Xi, 
i=n,+1 


where Aw; = W;+ar — UW; and € is a standard normal random variable. 

Under the assumption of a Poisson process, the probability of having one jump 
in the time interval (t,t + At] is A At and that of having more than one jump is 
o(At), where the symbol o(Ar) means that if we divide this term by Aż then its 
value tends to zero as At tends to zero. Therefore, for a small Ar, by ignoring 
multiple jumps, we have 


oe | Xn,+1 with probability à Ar, 
7 


i 0 with probability 1 — A At. 


Combining the prior results, we see that the simple return of the underlying asset 
is approximately distributed as 


Pan Tt wattoeVM +I x X, (6.30) 

t 

where J is a Bernoulli random variable with Pr(J = 1) = à At and Pr(J = 0) = 

1—AAt, and X is a double exponential random variable defined in Eq. (6.28). 

Equation (6.30) reduces to that of a geometric Brownian motion without jumps. 
Let G=wAt +oe/At+Ix X be the random variable on the right-hand 

side of Eq. (6.30). Using the independence between the exponential and normal 
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distributions used in the model, Kou (2002) obtains the probability density function 
of G as 


2 2 
j= À At 0? At/(2n?) Jews (2 -—o =) 4 lng (2 +o =") | 
2n ony At onv At 


+a- (===) (6.31) 
ov At adhe 7 


where w = x — u At — K, and f(-) and ®(-) are, respectively, the probability den- 
sity and cumulative distribution functions of the standard normal random variable. 
Furthermore, 


E(G) =u At +KÀ At, Var(G) = a? At + 4 At[2n? + K? (1 — à At). 


Figure 6.9 shows some comparisons between probability density functions of a 
normal distribution and the distribution of Eq. (6.31). Both distributions have mean 
zero and variance 2.0572 x 1074. The mean and variance are obtained by assuming 
that the return of the underlying asset satisfies u = 20% per annum, o = 20% per 
annum, At = 1 day = 1/252 year, A = 10, k = —0.02, and 7 = 0.02. In other 
words, we assume that there are about 10 daily jumps per year with average jump 
size —2%, and the jump size standard error is 2%. These values are reasonable 
for a U.S. stock. From the plots, the leptokurtic feature of the distribution derived 
from the jump diffusion process in Eq. (6.26) is clearly shown. The distribution 
has a higher peak and fatter tails than the corresponding normal distribution. 


6.9.1 Option Pricing under Jump Diffusion 


In the presence of random jumps, the market becomes incomplete. In this case, the 
standard hedging arguments are not applicable to price an option. But we can still 
derive an option pricing formula that does not depend on attitudes toward risk by 
assuming that the number of securities available is very large so that the risk of the 
sudden jumps is diversifiable and the market will therefore pay no risk premium 
over the risk-free rate for bearing this risk. Alternatively, for a given set of risk 
premiums, one can consider a risk-neutral measure P* such that 


dP, ` 
> = [r —AE(J — 1)]dt+odw,+d YoU =) 


i i=1 


i=l 


= (r —ìy)dt +0 dw, +d ou = Ji 


where r is the risk-free interest rate, J = exp(X) such that X follows the double 
exponential distribution of Eq. (6.27), y = e*/(1 — n*) — 1, 0< ņ < 1, and the 
parameters K, n, Y, and o become risk-neutral parameters taking consideration of 
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Figure 6.9 Density comparisons between normal distribution and distribution of Eq. (6.31). Dotted 
line denotes the normal distribution. Both distributions have mean zero and variance 2.0572x1074. 
(a) Overall comparison, (b) comparison of peaks, (c) left tails, and (d) right tails. 


the risk premiums; see Kou (2002) for more details. The unique solution of the 
prior equation is given by 


o? = 
P, = Po exp ee a t+ou,; Lr 


To price a European option in the jump diffusion model, it remains to compute the 
expectation, under the measure P*, of the discounted final payoff of the option. In 
particular, the price of a European call option at time ¢ is given by 


c = Eye"? (Pr — K)4] 


Le? 
= qem exp ( = - aw) (T =t) 


del = Te | m ESE |, (6.32) 


i=l + 


where T is the expiration time, (T — t) is the time to expiration measured in 
years, K is the strike price, (y)+ = max(0, y), and € is a standard normal random 
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variable. Kou (2002) shows that c; is analytically tractable as 


œ n ; 
p a MT = t)" 2j maeri 
Ct = DD > e MT rT oo ( ñ y 1 ) (Ain, j + Adin, j + A3in,j) 


s n! 
n=1 j=l 


+e A(T —t) [Pre AW(T (hy) — Ke @(h_)], (6.33) 


where ®(-) is the CDF of the standard normal random variable, 


= ee A 
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p- ln(P,/K)+ (r + o7/2—Aw)\(T —t)+nk 
T= 
= In(P,/K) + (r 4a7/2—Aaw)(T — t) 
i NT 
oV/T -t w 
K g 
w = ln ($) +r -o-(r-F)T-9-m, 
eX 
y= Tae =1, 


and the Hh;(-) functions are defined as 
E (asa 2/2 
Hh) = z (s —x)"e* “ds, n=0,1,..., (6.34) 
n! Jy 


and Hh—ı(x) = exp(—x?/2), which is vV2x f(x) with f(x) being the probability 
density function of a standard normal random variable; see Abramowitz and Stegun 
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(1972). The Hh,,(x) functions satisfy the recursion 
nNAh, (x) = Hhy_2(x) — x Hhy_1 (x), n>, (6.35) 


with starting values Hh_;(x) = en /2 and Hho(x) = J2n ®(—x). 

The pricing formula involves an infinite series, but its numerical value can be 
approximated quickly and accurately through truncation (e.g., the first 10 terms). 
Also, if à = 0 (i.e., there are no jumps), then it is easily seen that c; reduces to the 
Black-Scholes formula for a call option discussed before. 

Finally, the price of a European put option under the jump diffusion model 
considered can be obtained by using the put f-call parity; that is, 


Pt = Ct + kee = Pis 


Pricing formulas for other options under the jump diffusion model in Eq. (6.26) 
can be found in Kou (2002). 


Example 6.8. Consider the stock of Example 6.6, which has a current price 
of $80. As before, assume that the strike price of a European option is K = $81 
and other parameters are r = 0.08 and T — t = 0.25. In addition, assume that the 
price of the stock follows the jump diffusion model in Eq. (6.26) with parameters 
à = 10, «x = —0.02, and 7 = 0.02. In other words, there are about 10 jumps per 
year with average jump size —2% and jump size standard error of 2%. Using the 
formula in Eq. (6.33), we obtain c; = $3.92, which is higher than the $3.49 of 
Example 6.6 when there are no jumps. The corresponding put option assumes the 
value p; = $3.31, which is also higher than what we had before. As expected, 
adding the jumps while keeping the other parameters fixed increases the prices of 
both European options. Keep in mind, however, that adding the jump process to 
the stock price in a real application often leads to different estimates for the stock 
volatility o. 


6.10 ESTIMATION OF CONTINUOUS-TIME MODELS 


Next, we consider the problem of estimating directly the diffusion equation (i.e., 
Ito process) from discretely sampled data. Here the drift and volatility functions 
U(x, t) and o(x;,f) are time varying and may not follow a specific paramet- 
ric form. This is a topic of considerable interest in recent years. Details of the 
available methods are beyond the scope of this chapter. Hence, we only outline 
the approaches proposed in the literature. Interested readers can consult the corre- 
sponding references and Lo (1988). 

There are several approaches available for estimating a diffusion equation. The 
first approach is the quasi-maximum-likelihood approach, which makes use of the 
fact that for a small time interval dw; is normally distributed; see Kessler (1997) 
and the references therein. The second approach uses methods of moments; see 
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Conley, Hansen, Luttmer, and Scheinkman (1997) and the references therein. The 
third approach uses nonparametric methods; see Ait-Sahalia (1996, 2002). The 
fourth approach uses semiparametric and reprojection methods; see Gallant and 
Long (1997) and Gallant and Tauchen (1997). Recently, many researchers have 
applied Markov chain Monte Carlo methods to estimate the diffusion equation; see 
Eraker (2001) and Elerian, Chib, and Shephard (2001). 


APPENDIX A: INTEGRATION OF BLACK-SCHOLES FORMULA 


In this appendix, we derive the price of a European call option given in Eq. (6.19). 
Let x = In(Pr). By changing variable and using g(Pr)dPr = f(x) dx, where 
f(x) is the probability density function of x, we have 


c, = exp[—r(T — oi f (Pr — K)g(Pr) dP; 
K 


E e tT) I. (e* = K) fx) dx 
l 


n(K) 


= eT) A e* f (x) dx — Kf” fœ dx . (6.36) 
In(K) In(K) 


Because x = In(Pr) ~ N[In(P;) + (r — o? /2)(T — t), o° (T — t)], the integration 
of the second term of Eq. (6.36) reduces to 


o0 In(K) 
fsydx=1- | f(x)dx 


In(K) —oo 
= 1 — CDF{In(K)] 
=1-—(-h_) = O(h_), 


where CDF[In(X)] is the cumulative distribution function (CDF) of x = In(Pr) 
evaluated at In(K), ®(-) is the CDF of the standard normal random variable, and 


_ In(K) — In(P,) — (r — 0?/2)(T — t) 


—h_ 
oVT —t 
_ —In(P,/K) — (r — 07/2)(T — t) 
oV/T -t f 
The integration of the first term of Eq. (6.36) can be written as 
= 1l [x — In(P) — (r — 0? /D(T — D}? 
-r e |x - — araa dx, 
In(K) J 27,/07(T — t) 202(T — t) 
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where the exponent can be simplified to 


Ihe) + @ 07/7 -A 


202(T — t) 
fx nP + r+ oD - OP 
ey In(P,) + r(T =t). 


Consequently, the first integration becomes 


lo) lo) 
1 
e f(x) dx = P gi —— 
a i In(k) V270/o2(T — t) 


{x — [In(Pr) + (r +.07/2)(T -D17 
x exp Tny dx, 


which involves the CDF of a normal distribution with mean In(P;) + (r + 
o? /2)(T — t) and variance o?°(T — t). By using the same techniques as those of 
the second integration shown before, we have 


CO 
1 e f(x)dx = PeT  &(h,), 
In(K) 


where h, is given by 


= In(P;/K) + (r + o? /2)(T — t) 
7 ONT —t f 


Putting the two integration results together, we have 


h4 


cr =e "TDP e TD (hy) — K@(h_)] = P,O(h4) — Ke O(h_). 


APPENDIX B: APPROXIMATION TO STANDARD NORMAL 
PROBABILITY 


The CDF ®(x) of a standard normal random variable can be approximated by 


ae a f Werk + cok? + 03k} + cak4 + csk] if x = 0, 
I=) 1- (=x) if x <0, 


where f(x) = exp(—x?/2)//2m, k= 1/(1 + 0.2316419x), cı = 0.319381530, 
c2 = —0.356563782, c3 = 1.781477937, cg = —1.821255978, and cs = 
1.330274429. 

For illustration, using the earlier approximation, we obtain ®(1.96) = 0.975002, 
(0.82) = 0.793892, and ®(—0.61) = 0.270931. These probabilities are very 
close to that obtained from a typical normal probability table. 
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EXERCISES 


6.1. 


6.2. 


6.3. 


6.4. 


6:5. 


6.6. 


6.7. 


Assume that the log price p; = ln(P,) follows a stochastic differential 
equation 


dp; = y dt + o dw,, 


where w; is a Wiener process. Derive the stochastic equation for the price P;. 
Considering the forward price F of a nondividend-paying stock, we have 


Fir = Pe T9, 


where r is the risk-free interest rate, which is constant, and P, is the cur- 
rent stock price. Suppose P, follows the geometric Brownian motion d P, = 
uP, dt + o P, dw;. Derive a stochastic diffusion equation for F; r. 


Assume that the price of IBM stock follows the Ito process 
dP, = uP,dt + o P, dw,, 


where u and o are constant and w; is a standard Brownian motion. Consider 
the daily log returns of IBM stock in 1997. The average return and the sample 
standard deviation are 0.00131 and 0.02215, respectively. Use the data to 
estimate the parameters u and ø assuming that there were 252 trading days 
in 1997. 


Suppose that the current price of a stock is $120 per share with volatility 
o = 50% per annum. Suppose further that the risk-free interest rate is 7% per 
annum and the stock pays no dividend. (a) What is the price of a European 
call option contingent on the stock with a strike price of $125 that will expire 
in 3 months? (b) What is the price of a European put option on the same stock 
with a strike price of $118 that will expire in 3 months? If the volatility o 
is increased to 80% per annum, then what are the prices of the two options? 
Derive the limiting marginal effects of the five variables K, P,, T — t, o, 
and r on a European put option contingent on a stock. 

A stock price is currently $60 per share and follows the geometric Brownian 
motion dP, = uP, dt + oP, dt. Assume that the expected return u from the 
stock is 20% per annum and its volatility is 40% per annum. What is the 
probability distribution for the stock price in 2 years? Obtain the mean and 
standard deviation of the distribution and construct a 95% confidence interval 
for the stock price. 

A stock price is currently $60 per share and follows the geometric Brownian 
motion dP, = uP, dt +oP,dt. Assume that the expected return u from 
the stock is 20% per annum and its volatility is 40% per annum. What is 
the probability distribution for the continuously compounded rate of return 
of the stock over 2 years? Obtain the mean and standard deviation of the 
distribution. 
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6.8. Suppose that the current price of stock A is $70 per share and the price 
follows the jump diffusion model in Eq. (6.26). Assume that the risk-free 
interest rate is 8% per annum, the stock pays no dividend, and its volatility 
(o) is 30% per annum. In addition, the price on average has about 15 jumps 
per year with average jump size —2% and jump standard error 3%. What is 
the price of a European call option with strike price $75 that will expire in 
3 months? What is the price of the corresponding European put option? 

6.9. Consider the European call option of a nondividend-paying stock. Suppose 
that P, = $20, K = $18, r = 6% per annum, and T — t = 0.5 year. If the 
price of a European call option of the stock is $2.10, what opportunities are 
there for an arbitrageur? 

6.10. Consider the put option of a nondividend-paying stock. Suppose that P, = 
$44, K = $47, r = 6% per annum, and T — t = 0.5 year. If the European 
put option of the stock is selling at $1.00, what opportunities are there for 
an arbitrageur? 
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CHAPTER 7 


Extreme Values, Quantiles, 
and Value at Risk 


Extreme price movements in the financial markets are rare but important. The stock 
market crash on Wall Street in October 1987 and other big financial crises such as 
the Long-Term Capital Management and the bankruptcy of Lehman Brothers have 
attracted a great deal of attention among investors, practitioners, and researchers. 
The recent worldwide financial crisis characterized by the substantial increase in 
market volatility, for example, the volatility index (VIX) of the Chicago Board 
Options Exchange index, and the big drops in market indices has further gener- 
ated discussions on market risk and margin setting for financial institutions. As a 
result, value at risk (VaR) has become the standard measure of market risk in risk 
management. Its usefulness and weaknesses are widely discussed. 

In this chapter, we discuss various methods for calculating VaR and the statistical 
theories behind these methods. In particular, we consider the extreme value theory 
developed in the statistical literature for studying rare (or extraordinary) events 
and its application to VaR. Both unconditional and conditional concepts of extreme 
values are discussed. The unconditional approach to VaR calculation for a financial 
position uses the historical returns of the instruments involved to compute VaR. 
On the other hand, a conditional approach uses the historical data and explanatory 
variables to calculate VaR. The explanatory variables may include macroeconomic 
variables of an economy and accounting variables of companies involved. 

Other approaches to VaR calculation discussed in the chapter are RiskMetrics, 
econometric modeling using volatility models, and empirical quantile. We use daily 
log returns of IBM stock to illustrate the actual calculation of all the methods 
discussed. The results obtained can therefore be used to compare the performance 
of different methods. Figure 7.1 shows the time plot of daily log returns of IBM 
stock from July 3, 1962, to December 31, 1998, for 9190 observations. 

VaR is a point estimate of potential financial loss. It contains a certain degree 
of uncertainty. It also has a tendency to underestimate the actual loss if an extreme 
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Figure 7.1 Time plot of daily log returns of IBM stock from July 3, 1962, to December 31, 1998. 


event actually occurs. To overcome the weaknesses of VaR, we discuss other risk 
measures such as expected shortfalls and the loss distribution of a financial position 
in the chapter. 


7.1 VALUE AT RISK 


There are several types of risk in financial markets. Credit risk, operational risk, 
and market risk are the three main categories of financial risk. Value at risk (VaR) 
is mainly concerned with market risk, but the concept is also applicable to other 
types of risk. VaR is a single estimate of the amount by which an institution’s 
position in a risk category could decline due to general market movements during 
a given holding period; see Duffie and Pan (1997) and Jorion (2006) for a general 
exposition of VaR. The measure can be used by financial institutions to assess 
their risks or by a regulatory committee to set margin requirements. In either case, 
VaR is used to ensure that the financial institutions can still be in business after 
a catastrophic event. From the viewpoint of a financial institution, VaR can be 
defined as the maximal loss of a financial position during a given time period for a 
given probability. In this view, one treats VaR as a measure of loss associated with 
a rare (or extraordinary) event under normal market conditions. Alternatively, from 
the viewpoint of a regulatory committee, VaR can be defined as the minimal loss 
under extraordinary market circumstances. Both definitions will lead to the same 
VaR measure, even though the concepts appear to be different. 
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In what follows, we define VaR under a probabilistic framework. Suppose that 
at the time index ¢ we are interested in the risk of a financial position for the 
next £ periods. Let AV (£) be the change in value of the underlying assets of the 
financial position from time ¢ to ++ £ and L(£) be the associated loss function. 
These two quantities are measured in dollars and are random variables at the time 
index t. L(£) is a positive or negative function of AV (£) depending on the position 
being short or long. Denote the cumulative distribution function (CDF) of L(£) by 
F(x). We define the VaR of a financial position over the time horizon £ with tail 
probability p as 


p = Pr[L(€) > VaR] = 1 — Pr[L(£) < VaR]. (7.1) 


From the definition, the probability that the position holder would encounter a loss 
greater than or equal to VaR over the time horizon £ is p. Alternatively, VaR 
can be interpreted as follows. With probability (1 — p), the potential loss encoun- 
tered by the holder of the financial position over the time horizon £ is less than 
VaR. 

The previous definition shows that VaR is concerned with the upper tail behavior 
of the loss CDF F(x). For any univariate CDF F(x) and probability q, such that 
0 <q < 1, the quantity 


xq = inf{x|Fy(x) > q} 


is called the qth quantile of F(x), where inf denotes the smallest real number 
x satisfying F(x) > q. If the random variable L(£) of F(x) is continuous, then 
q = Pr[L(é) < xq]. 

If the CDF F(x) of Eq. (7.1) is known, then 1 — p = Pr[L(£2) < VaR] so that 
VaR is simply the (1 — p)th quantile of the CDF of the loss function L(£) (ie., 
VaR = x\_,). Sometimes, VaR is referred to as the upper pth quantile because 
p is the upper tail probability of the loss distribution. The CDF is unknown in 
practice, however. Studies of VaR are essentially concerned with estimation of the 
CDF and/or its quantile, especially the upper tail behavior of the loss CDF. 

In real applications, calculation of VaR involves several factors: 


1. The probability of interest p, such as p = 0.01 for risk management and 
p = 0.001 in stress testing. 

2. The time horizon £. It might be set by a regulatory committee, such as 1 day 
or 10 days for market risk and | year or 5 years for credit risk. 

3. The frequency of the data, which might not be the same as the time horizon 
£. Daily observations are often used in market risk analysis. 

4. The CDF F(x) or its quantiles. 

5. The amount of the financial position or the mark-to-market value of the 
portfolio. 
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Among these factors, the CDF F(x) is the focus of econometric modeling. 
Different methods for estimating the CDF give rise to different approaches to VaR 
calculation. 


Remark. The definition of VaR in Eq. (7.1) is based on the upper tail of 
a loss function. For a long financial position, loss occurs when the returns are 
negative. Therefore, we shall use negative returns in data analysis for a long 
financial position. Furthermore, the VaR defined in Eq. (7.1) is in dollar amount. 
Since log returns correspond approximately to percentage changes in value of 
a financial asset, we use log returns r, in data analysis. The VaR calculated 
from the upper quantile of the distribution of r,,,; given information available 
at time f is therefore in percentage. The dollar amount of VaR is then the cash 
value of the financial position times the VaR of the log return series. That is, 
VaR = Value x VaR(of log returns). If necessary, one can also use the approxima- 
tion VaR = Value x [exp(VaR of log returns) — 1]. 


Remark. VaR is a prediction concerning possible loss of a portfolio in a 
given time horizon. It should be computed using the predictive distribution of 
future returns of the financial position. For example, the VaR for a 1-day hori- 
zon of a portfolio using daily returns r; should be calculated using the predictive 
distribution of 7;,; given information available at time t. From a statistical view- 
point, predictive distribution takes into account the parameter uncertainty in a 
properly specified model. However, predictive distribution is hard to obtain, and 
most of the available methods for VaR calculation ignore the effects of parameter 
uncertainty. 


Remark. From the prior discussion, VaR is just a quantile of the loss function. 
It does not fully describe the upper tail behavior of the loss function. In practice, 
two assets may have the same VaR yet encounter different losses when the VaR is 
exceeded. Furthermore, the VaR does not satisfy the sub-additivity property, which 
states that a risk measure for two portfolios after they have been merged should be 
no greater than the sum of their risk measures before they were merged. Therefore, 
care must be exercised in using VaR to measure risk. We discuss the concept of 
expected shortfall later as an alternative to measuring risk. The expected shortfall 
is also known as the conditional value at risk (CVaR). 


7.2 RISKMETRICS 


J. P. Morgan developed the RiskMetrics methodology to VaR calculation; see 
Longerstaey and More (1995). In its simple form, RiskMetrics assumes that the 
continuously compounded daily return of a portfolio follows a conditional normal 
distribution. Denote the daily log return by r, and the information set available at 
time t — 1 by F;—1. RiskMetrics assumes that r;|F;—1 ~ N (Hs, o) where u, is the 
conditional mean and o? is the conditional variance of r;. In addition, the method 
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assumes that the two quantities evolve over time according to the simple model: 
= ees 2 
Ly = 0, 0, SAd; i F= A l>a>0. (7.2) 


Therefore, the method assumes that the logarithm of the daily price, p; = In(P;), 
of the portfolio satisfies the difference equation p; — p;-1 = a, where a; = o;€; 
is an IGARCH(1,1) process without drift. The value of œ is often in the interval 
(0.9, 1) with a typical value of 0.94. 

A nice property of such a special random-walk IGARCH model is that the 
conditional distribution of a multiperiod return is easily available. Specifically, for 
a k-period horizon, the log return from time t+ 1 to time t+ (inclusive) is 
rilk] = rrti +--+ + rr4k-1 +rr4k. We use the square bracket [k] to denote a k- 
horizon return. Under the special IGARCH(1,1) model in Eq. (7.2), the conditional 
distribution r;[k]|F; is normal with mean zero and variance o7 [kl], where o? [k] 
can be computed using the forecasting method discussed in Chapter 3. Using the 
independence assumption of e; and model (7.2), we have 


k 
o7 [k] = Var(r:[k]|F;) = X` Var(arsilF), 


i=l 


where Var(a;+4;|F;) = Elo}; |F;) can be obtained recursively. Using r;—1 = a;—1 = 
Ot—1€1—1, We can rewrite the volatility equation of the IGARCH(1,1) model in Eq. 
(7.2) as 


o? =o}; + (1-a); (e; —1) for allt. 
In particular, we have 
opi = ohia +A aohia Cual for i=2,...,k. 


Since E (e€ — 1|F,) = 0 for i > 2, the prior equation shows that 


2 
t+i—1 
Elof lF) = Eloi al F) for i=2,...,k. (7.3) 


For the l-step-ahead volatility forecast, Eq. (7.2) shows that ak =ao7+(1—- 
a)r?. Therefore, Eq. (7.3) shows that Var(r;+;|F;) = oF ı for i > 1 and, hence, 
o7 [k] = kop, ,- The results show that r;[k]|F; ~ N (0, ko; 4 ,)- Consequently, under 
the special IGARCH(1,1) model in Eq. (7.2) the conditional variance of r;[k] is 
proportional to the time horizon k. The conditional standard deviation of a k-period 
horizon log return is then koii, which is /k times Or41- 

Given a tail probability, RiskMetrics uses the result r;[k]|F; ~ N (0, havea) 
to calculate VaR for the log return. If the tail probability is set to 5%, then 
VaR = 1.650,41 for the next trading day. This is the upper 5% quantile (or the 
95th percentile) of a normal distribution with mean zero and standard devia- 


tion o;41. For the next k trading days, VaR[k] = 1.65/kor41, which is the 95th 
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percentile of N(O, kOe): Similarly, if the tail probability is 1%, then VaR = 
2.3260;,, for the next trading day and VaR[k] = 2.3260 koj41 for the next k 
trading days. 

Consider the case of 1% tail probability. The VaR for the portfolio under Risk- 
Metrics is then 


VaR = Amount of position x 2.3260;+1, 
for the next trading day and that of a k-day horizon is 
VaR(k) = Amount of position x 2.3260 kop 4. 1, 


where the argument (k) of VaR is used to denote the time horizon and the portfolio 
value is measured in dollars. Consequently, under RiskMetrics, we have 


VaR (k) = Vk x VaR. 


This is referred to as the square root of time rule in VaR calculation under Risk- 
Metrics. 

If the log returns are in percentages, then the 1% VaR for the next trading day 
is VaR = Amount of position x 2.3260;41/100, where o;+1 is the volatility of the 
percentage log returns. 

Note that because RiskMetrics assumes log returns are normally distributed with 
mean zero, the loss function is symmetric and VaR are the same for long and short 
financial positions. 


Example 7.1. The sample standard deviation of the continuously compounded 
daily return of the German mark/U.S. dollar exchange rate was about 0.53% 
in June 1997. Suppose that an investor was long in $10 million worth of 
mark/dollar exchange rate contract. Then the 5% VaR for a 1-day horizon of the 
investor is 


$10,000,000 x (1.65 x 0.0053) = $87,450. 
The corresponding VaR for 10-day horizon is 


$10,000,000 x (v 10 x 1.65 x 0.0053) ~ $276,541. 


Example 7.2. Consider the daily IBM log returns of Figure 7.1. As mentioned 
in Chapter 1, the sample mean of the returns is significantly different from zero. 
However, for demonstration of VaR calculation using RiskMetrics, we assume in 
this example that the conditional mean is zero and the volatility of the returns 
follows an IGARCH(1,1) model without drift. The fitted model is 


r=, =O, G7 = 0.939602 ; + (1 —0.9396)a2,, (7.4) 
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where {e;} is a standard Gaussian white noise series. As expected, this model is 
rejected by the Q statistics. For instance, we have a highly significant statistic 
Q(10) = 56.19 for the squared standardized residuals. 

From the data and the fitted model, we have roj99 = —0.0128 and te = 
0.0003472. Therefore, the 1-step-ahead volatility forecast is Geog) = 0.000336. 
The 95% quantile of the conditional distribution 79191 | F9190 is 1.65 x v 0.000336 = 
0.03025. Consequently, the 1-day horizon 5% VaR of a long position of $10 mil- 
lions is 


VaR = $10,000,000 x 0.03025 = $302,500. 


The 99% quantile is 2.326 x ~v 0.000336 = 0.04265, and the corresponding 1% 
VaR for the same long position is $426,500. 


Remark. To implement RiskMetrics in S-Plus, one can use ewmal (expo- 
nentially weighted moving average of order 1) under the mgarch (multivariate 
GARCH) command to obtain the estimate of 1 — a. Then, use the command pre- 
dict to obtain volatility forecasts. For the IBM data used, the estimate of a is 
1 — 0.036 = 0.964 and the 1-step-ahead volatility forecast is Go199(1) = 0.01888. 
Please see the demonstration below. This leads to VaR = $10,000,000 x (1.65 x 
0.01888) = $311,520 and VaR = $439,187 for p = 0.05 and 0.01, respectively. 
These two values are slightly higher than those of Example 7.2, which are based 
on estimates of the RATS package. 


S-Plus Demonstration 
The following output has been simplified: 


> ibm.risk=mgarch(ibm-1, ~ewmal) 
> ibm.risk 

ALPHA 0.036 

> predict (ibm.risk, 2) 
Ssigma.pred 0.01888 


7.2.1 Discussion 


An advantage of RiskMetrics is simplicity. It is easy to understand and apply. 
Another advantage is that it makes risk more transparent in the financial markets. 
However, as security returns tend to have heavy tails (or fat tails), the normality 
assumption used often results in underestimation of VaR. Other approaches to VaR 
calculation avoid making such an assumption. 

The square root of time rule is a consequence of the special model used by 
RiskMetrics. If either the zero mean assumption or the special IGARCH(1,1) 
model assumption of the log returns fails, then the rule is invalid. Consider the 
simple model 


n=U+ra, at = 01€, u #0, 


2 2 2 
0, = aa at O =a 
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where {e+} is a standard Gaussian white noise series. The assumption that u Æ 0 
holds for returns of many heavily traded stocks on the NYSE; see Chapter 1. For 
this simple model, the distribution of r;4) given F; is N (u, Gag The 95% quantile 
used to calculate the 1-period horizon VaR becomes u + 1.650;41. For a k-period 
horizon, the distribution of r;[k] given F, is N (kp, her 4); where as before r,;[k] = 
r41 H: + rrak. The 95% quantile used in the k-period horizon VaR calculation 
is ku + 1.65/kop41 = Vk(V ku + 1.650,41). Consequently, VaR(k) 4 Vk x VaR 
when the mean return is not zero. It is also easy to show that the rule fails when 
the volatility model of the return is not an IGARCH(1,1) model without drift. 


7.2.2 Multiple Positions 


In some applications, an investor may hold multiple positions and needs to com- 
pute the overall VaR of the positions. RiskMetrics adopts a simple approach for 
doing such a calculation under the assumption that daily log returns of each 
position follow a random-walk IGARCH(1,1) model. The additional quantities 
needed are the cross-correlation coefficients between the returns. Consider the 
case of two positions. Let VaR; and VaRz be the VaR for the two positions 
and 12 be the cross-correlation coefficient between the two returns—that is, 
p12 = Cov (rir, rar) /[Var(r1;) Var(r2;)]°°. Then the overall VaR of the investor is 


VaR = ,/ VaR} + VaR3 + 212 VaR; VaR2. 


The generalization of VaR to a position consisting of m instruments is straightfor- 
ward as 


VaR = |J VaR? +2% pj; VaR; VaR. 


i=l i<j 


where j;; is the cross-correlation coefficient between returns of the ith and jth 
instruments and VaR; is the VaR of the ith instrument. 

The prior formula is obtained using the assumption that the joint distribution 
of the log returns of assets involved in the portfolio is multivariate normal with 
mean zero and covariance matrix X,+. Under such an assumption, the log return 
of the portfolio is normal with mean zero and finite variance; see Appendix B of 
Chapter 8 for properties of multivariate normal variables. 


7.2.3 Expected Shortfall 


Given a tail probability p, VaR is simply the (1 — p)th quantile of the loss func- 
tion. In practice, the actual loss, if it occurs, can be greater than VaR. In this 
sense, VaR may underestimate the actual loss. To have a better assessment of the 
potential loss, one can consider the expected value of the loss function if the VaR 
is exceeded. This consideration leads to the concept of expected shortfall (ES). 
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Under RiskMetrics, the loss function is normally distributed so that the conditional 
distribution of the loss function given that a VaR is exceeded is a truncated (from 
below) normal distribution. Properties such as mean and variance of a truncated 
normal distribution have been well-studied in the statistical literature. We can use 
the mean of the distribution to calculate expected shortfall. Specifically, consider 
the standard normal distribution X ~ N (0, 1). For a given upper tail probability p, 
let q = 1 — p and VaR, be the associated VaR, that is, VaR, is the qth quantile of 
X. Then the expectation of X given X > VaR, is E(X|X > VaR,) = f(VaR,)/p, 
where f(x) = (1/2) exp(—x?/2) is the pdf of X. The expected shortfall for a 
log return r, with conditional distribution N (0, oa?) is then 


VaR 1p 
oO oe Bs d 
p p 


For example, if p = 0.05, then VaRo.95 ~ 1.645 and f(VaR,)/p = f(1.645)/0.05 
= 2.0627 so that the expected shortfall under RiskMetrics is ESo.95 = 2.06270;. If 
p = 0.01, then ESo.99 = 2.66520;. 


7.3 ECONOMETRIC APPROACH TO VAR CALCULATION 


A general approach to VaR calculation is to use the time series econometric models 
of Chapters 2—4. For a log return series, the time series models of Chapter 2 can 
be used to model the mean equation, and the conditional heteroscedastic models 
of Chapter 3 or 4 are used to handle the volatility. For simplicity, we use GARCH 
models in our discussion and refer to the approach as an econometric approach to 
VaR calculation. Other volatility models, including the nonlinear ones in Chapter 4, 
can also be used. 

Consider the log return r, of an asset. A general time series model for r, can be 
written as 


P q 

r= 0+ >) piri +a — Y Oja j, (7.5) 
i=l j=l 

at = Oret, 

of =aot+ > aja?_; +) Bjo; (7.6) 
i=l j=l 


Equations (7.5) and (7.6) are the mean and volatility equations for r;. These two 
equations can be used to obtain 1|-step-ahead forecasts of the conditional mean and 
conditional variance of 7; assuming that the parameters are known. Specifically, 
we have 7 q 

È) = po + D> Qiri — Do jar; 
j=1 


i=l 


u v 
a2 2 2 
6, (1) = œo + ` Qidi} + y BjOp41-;- 


i=l j=l 
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If one further assumes that e; is Gaussian, then the conditional distribution of 
r;41 given the information available at time t is N[P (1), ô? (1)]. Quantiles of this 
conditional distribution can easily be obtained for VaR calculation. For example, the 
95% quantile is 7;(1) + 1.656;(1). If one assumes that €; is a standardized Student-t 
distribution with v degrees of freedom, then the quantile is 7,(1) + t*(1 — p)6;(1), 
where tž(1 — p) is the (1 — p)th quantile of a standardized Student-f distribution 
with v degrees of freedom. 

The relationship between quantiles of a Student-t distribution with v degrees 
of freedom, denoted by ¢,, and those of its standardized distribution, denoted by 
ty, is 


E aleei Jones 
pam so =r a = gea LE < gea 


where v >2. That is, if q is the pth quantile of a Student-t distribution with 
v degrees of freedom, then q/v/v/(v — 2) is the pth quantile of a standardized 
Student-r distribution with v degrees of freedom. Therefore, if €; of the GARCH 
model in Eq. (7.6) is a standardized Student-r distribution with v degrees of freedom 


and the upper tail probability is p, then the (1 — p)th quantile used to calculate 
the 1-period horizon VaR at time index ż is 


ty(1 — p)oi(1) 
Vv/@ = 2) ” 


where f,(1 — p) is the (1 — p)th quantile of a Student-r distribution with v degrees 
of freedom. 


7,1) + 


Example 7.3. Consider again the daily IBM log returns of Example 7.2. We 
use two volatility models to calculate VaR of 1-day horizon at t = 9190 for a long 
position of $10 million. These econometric models are reasonable based on the 
modeling techniques of Chapters 2 and 3. 

Because the position is long, we use r; = —r;, where rf is the usual log return 
of IBM stock shown in Figure 7.1. 


CASE 1. Assume that €; is standard normal. The fitted model is 


Fe = —0.00066 = 0.0247r;—2 + dt, dat = Otét, 
a = 0.00000389 + 0.0799a?_, + 0.907307. |. 
From the data, we have rojg9 = 0.00201, roj99 = 0.0128, and Gaia = 0.00033455. 
Consequently, the prior AR(2)-GARCH(1,1) model produces 1-step-ahead fore- 


casts as 


foioo(1) = —0.00071 and  6Z.99(1) = 0.000321]. 
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The 95% quantile is then 
—0.00071 + 1.6449 x ~v 0.0003211 = 0.02877. 


The VaR for a long position of $10 million with probability 0.05 is 
VaR = $10,000,000 x 0.02877 = $287,700. The result shows that, with proba- 
bility 95%, the potential loss of holding that position next day is $287, 200 or 
less assuming that the AR(2)—GARCH(1,1) model holds. If the tail probability is 
0.01, then the 99% quantile is 


—0.00071 + 2.3262 x V0.0003211 = 0.0409738. 
The VaR for the position becomes $409, 738. 


CASE2. Assume that €; is a standardized Student-r distribution with 5 degrees 
of freedom. The fitted model is 


y= —0.0003 — 0.0335r;_2 + a, dt = 0Ot6t, 
a = 0.000003 + 0.0559a?_, + 0.935007 ,. 
From the data, we have ro1g9 = 0.00201, ro190 = 0.0128, and ödü = 0.000349. 


Consequently, the prior Student-+ AR(2)—-GARCH(1,1) model produces 1-step- 
ahead forecasts 


fo190(1) = —0.000367 and ĉĉioo(1) = 0.0003386. 


The 95% quantile of a Student-r distribution with 5 degrees of freedom is 2.015 
and that of its standardized distribution is 2.015/,/5/3 = 1.5608. Therefore, the 
95% quantile of the conditional distribution of 7919, given F190 is 


—0.000367 + 1.5608 0.0003386 = 0.028354. 
The VaR for a long position of $10 million is 
VaR = $10,000,000 x 0.028352 = $283,520, 


which is essentially the same as that obtained under the normality assumption. The 
99% quantile of the conditional distribution is 


—0.000367 + (3.3649/,/5/3)V 0.0003386 = 0.0475943. 


The corresponding VaR is $475, 943. Comparing with that of Case 1, we see the 
heavy-tail effect of using a Student-r distribution with 5 degrees of freedom; it 
increases the VaR when the tail probability becomes smaller. In R and S-Plus, the 
quantile of a Student-r distribution with m degrees of freedom can be obtained 
by the command gt (p,m), for example, xp = gt(0.99,5.23) for the 99th per- 
centile of a Student-r distribution with 5.23 degrees of freedom. 
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7.3.1 Multiple Periods 


Suppose that at time h we want to compute the k-horizon VaR of an asset whose 
log return is r;. The variable of interest is the k-period log return at the forecast 
origin h (i.e., ra[k] = rp41 +*+ + rn+g). If the return r; follows the time series 
model in Eqs. (7.5) and (7.6), then the conditional mean and variance of r,[k] 
given the information set F} can be obtained by the forecasting methods discussed 
in Chapters 2 and 3. 


Expected Return and Forecast Error 


The conditional mean E(r,[k]|F);,) can be obtained by the forecasting method of 
ARMA models in Chapter 2. Specifically, we have 


Falk] =rpn(1) +--+ ralk), 


where rp (£) is the £-step-ahead forecast of the return at the forecast origin A. These 
forecasts can be computed recursively as discussed in Section 2.6.4. Using the MA 
representation 


r, = U + a + yia + yar +-+: 


of the ARMA model in Eq. (7.5), we can write the £-step-ahead forecast error at 
the forecast origin h as 


en(l) = rage — rn£) = anpe + Widnze-1 + +++ + We-14n4i; 


see Eq. (2.33) and the associated forecast error. The forecast error of the expected 
k-period return 7,[k] is the sum of 1-step to k-step forecast errors of r, at the 
forecast origin and can be written as 


enlk] = en (1) + en(2) +--+ + en(k) 


k-1 
= any + Gayo + Yid) + + DS ian sei 
i=0 
k-1 
= anyk + (1 + Widange-1 +++ + (x n) ah+1, (7.7) 
i=0 


where Wo = 1. 


Expected Volatility 
The volatility forecast of the k-period return at the forecast origin h is the condi- 
tional variance of e;,[k] given Fp. Using the independent assumption of €;+; for 
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i=1,...,k, where ati = 074;€:4;, we have 
k-1 2 
Via (entk]) = Va (ange) + + Wi)? Vin ange—1) +++ + (x n) Vn (Gn+1) 
i=0 


k-1 2 
= oft) ++ ath —D 4-4 (Sows) af (7.8) 
i=0 


where V;,(z) denotes the conditional variance of z given Fp and oa; (t) is the £- 
step-ahead volatility forecast at the forecast origin h. If the volatility model is 
the GARCH model in Eq. (7.6), then these volatility forecasts can be obtained 
recursively by the methods discussed in Chapter 3. 
As an illustration, consider the special time series model 
y=U+ra, at = OE, 
oF = @o + aa? , + Bio i 


Then we have y; = 0 for all i > 0. The point forecast of the k-period return at the 
forecast origin h is r;,[k] = ku and the associated forecast error is 


€nlk] = ange + Gn¢e-1 + +++ + ahga. 
Consequently, the volatility forecast for the k-period return at the forecast origin 


h is 


k 
Var(en[k]| Fn) = $` of (0). 
e=1 
Using the forecasting method of GARCH(1,1) models in Section 3.5, we have 
of (1) = æo + aa; + Biop, 


of (£) = ao + (01 + Bi)oZ(e — 1), £=2,...,k. (7.9) 


Using Eq. (7.9), we obtain that for the case of w; = 0 for i > 0, 


ak _ ak 
Var(en [k]| Fh) = = («- — )+ = o7 (1), (7.10) 


where ¢ = a; + fı < 1. If y; Æ 0 for some i > 0, then one should use the general 
formula of Var(e,[k]|F),) in Eq. (7.8). If €; is Gaussian, then the conditional distri- 
bution of 7,[k] given Fp is normal with mean ku and variance Var(ep,[k]|F,). The 
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quantiles needed in VaR calculations are readily available. If the conditional dis- 
tribution of a; is not Gaussian (e.g., a Student-t or generalized error distribution), 
simulation can be used to obtain the multiperiod VaR. 


Example 7.3 (Continued). Consider the Gaussian AR(2)-GARCH(1,1) 
model of Example 7.3 for the daily log returns of IBM stock. Suppose that we 
are interested in the VaR of a 15-day horizon starting at the forecast origin 9190 
(i.e., December 31, 1998). We can use the fitted model to compute the conditional 
mean and variance for the 15-day log return via roj99[15] = yei r9190+i 
given Fo190. The conditional mean is —0.00998 and the conditional variance is 
0.0047948, which is obtained by the recursion in Eq. (7.9). The 95% quantile of 
the conditional distribution is then —0.00998 + 1.6449./0.0047948 = 0.1039191. 
Consequently, the 5% 15-day horizon VaR for a long position of $10 million 
is VaR = $10,000,000 x 0.1039191 = $1,039,191. This amount is smaller than 
$287,700 x /15 = $1,114,257. This example further demonstrates that the 
square root of time rule used by RiskMetrics holds only for the special white 
noise IGARCH(1,1) model used. When the conditional mean is not zero, proper 
steps must be taken to compute the k-horizon VaR. 


7.3.2 Expected Shortfall under Conditional Normality 


We can use the result of Section 7.2.3 to calculate the ES when the conditional 
distribution of the log return is N (uz, oP). The result is 


ES, = Ur + 


Ot, 


f@q) 
p 


where q = 1 — p and xz, is the qth quantile of the standard normal distribution. 
For instance, if p = 0.01, then ESo.99 = u; + 2.66520;. 


7.4 QUANTILE ESTIMATION 


Quantile estimation provides a nonparametric approach to VaR calculation. It makes 
no specific distributional assumption on the return of a portfolio except that the 
distribution continues to hold within the prediction period. There are two types of 
quantile methods. The first method is to use empirical quantile directly, and the 
second method uses quantile regression. 


7.4.1 Quantile and Order Statistics 


Assuming that the distribution of return in the prediction period is the same as that 
in the sample period, one can use the empirical quantile of the return r; to calculate 
VaR. Let r1,..., rn be the returns of a portfolio in the sample period. The order 
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statistics of the sample are these values arranged in increasing order. We use the 
notation 


ra SD Se LI 


to denote the arrangement and refer to rg) as the ith order statistic of the sample. 
In particular, ra) is the sample minimum and rn) the sample maximum. 

Assume that the returns are independent and identically distributed random vari- 
ables that have a continuous distribution with probability density function (pdf) 
f(x) and CDF F(x). Then we have the following asymptotic result from the sta- 
tistical literature [e.g., Cox and Hinkley (1974), Appendix 2], for the order statistic 
ræ, Where £ = np with 0 < p < 1. 


Result. Let x, be the pth quantile of F(x), that is, x, = F =p), Assume that 
the pdf f(x) is not zero at x, [i.e., f(xp) # 0]. Then the order statistic r) is 
asymptotically normal with mean x, and variance p(1 — p)/[nf Be: p)]. That is, 


1 = 
ro ~N frp FORE t= np. (7.11) 


Based on the prior result, one can use r) to estimate the quantile xp, where 
£ = np. In practice, the probability of interest p may not satisfy that np is a positive 
integer. In this case, one can use simple interpolation to obtain quantile estimates. 
More specifically, for noninteger np, let €; and £2 be the two neighboring positive 
integers such that £; < np < £2. Define p; = ¢;/n. The previous result shows that 
ræ, is a consistent estimate of the quantile x,,. From the definition, pı < p < p2. 
Therefore, the quantile x, can be estimated by 


A P2—P P— Pi 
Xp = rey) + r)» (7.12) 
P2 — Pı P2 — Pı 


In practice, sample quantiles can easily be obtained from most statistical packages, 
including R and S-Plus. A demonstration is given after the examples. 


Example 7.4. Consider the daily log returns of Intel stock from December 15, 
1972, to December 31, 2008. There are 9096 observations. For a long position in the 
Intel stock, we consider the negative log returns. Since 9096 x 0.95 = 8641.2, we 
have £; = 8641, €2 = 8642, pı = 8641/9096, and p2 = 8642/9096. The empirical 
95% quantile of the negative log returns can be obtained as 


X0.95 = 0.87 (8641) + 0.21 (8642) = 4.2952%, 


ra) is the ith order statistic of the negative log returns. In this particular instance, 
T(8641) = 4.2951% and T (8642) = 4.2954%. 
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R Demonstration 


> da=read.table("d-intc7208.txt",header=T) 
> intc=log(da[,2]+1) 
> nintc=-intc 
> quantile(nintc,0.95) 
95% 
0.04295213 
> quantile(rtn,.05) % An alternative 
5% 
-0.04295213 


Example 7.5. Consider again the daily log returns of IBM stock from July 3, 
1962, to December 31, 1998. Using all 9190 observations, the empirical 95% quan- 
tile of the negative log returns can be obtained as (r(8730) + 1(8731))/2 = 0.021603, 
where rg) is the ith order statistic and np = 9190 x 0.95 = 8730.5. The VaR of 
a long position of $10 million is $216,030, which is much smaller than those 
obtained by the econometric approach discussed before. Because the sample size is 
9190, we have 9098 < 9190 x 0.99 < 9099. Let pı = 9198/9190 = 0.98999 and 
p2 = 9099/9190 = 0.9901. The empirical 99% quantile can be obtained as 


2 p2 — 0.99 0.99 — pı 
X0.99 = ——— 9098) + ———— (9099) 
P2— Pı P2- 
0.0001 0.00001 
= (3.627) + (3.657) 
0.00011 0.00011 
= 3.630. 


The 1% 1-day horizon VaR of the long position is $363, 000. Again this amount 
is lower than those obtained before by other methods. 


Discussion. Advantages of using the empirical quantile method to VaR cal- 
culation include (a) simplicity and (b) using no specific distributional assumption. 
However, the approach has several drawbacks. First, it assumes that the distribution 
of the return r, remains unchanged from the sample period to the prediction period. 
Given that VaR is concerned mainly with tail probability, this assumption implies 
that the predicted loss cannot be greater than that of the historical loss. It is defi- 
nitely not so in practice. Second, when the tail probability p is small, the empirical 
quantile is not an efficient estimate of the theoretical quantile. Third, the direct 
quantile estimation fails to take into account the effect of explanatory variables 
that are relevant to the portfolio under study. In real application, VaR obtained by 
the empirical quantile can serve as a lower bound for the actual VaR. 


The expected shortfall can also be estimated directly from the sample returns. 
Let x, be the empirical qth quantile, where q = 1 — p with p being the upper tail 
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probability. We have 


1 n P 
ES, = N. X rolko > XqJ) 
@ i=1 


where I[] = 1 if xa > Xq and = 0, otherwise, and N, denotes the number of x; 
greater than X,. For illustration, consider the negative IBM daily log returns. If 
p = 0.01, we have £o.99 = 3.630. Therefore, ESo 99 = 5.097. 


R Demonstration 


da=read.table("d-ibm6298.txt",header=T) 
ibm=log(da[,2]+1)*100 
nibm=-ibm 
q99=quantile(nibm, 0.99) 
q99 
99% 
[1] 3.630295 
> idx=c(1:length(nibm) ) [nibm>q99] % locate the exceedances 
> es=mean (nibm[idx] ) 
> es 
ELI 54097222 


VVVVYV 


7.4.2 Quantile Regression 


In real application, one often has explanatory variables available that are important 
to the problem under study. For example, the action taken by Federal Reserve 
Banks on interest rates could have important impacts on the returns of U.S. 
stocks. It is then more appropriate to consider the distribution function r;+1|F;, 
where F; includes the explanatory variables. In other words, we are interested 
in the quantiles of the distribution function of r;+ı given F;. Such a quantile 
is referred to as a regression quantile in the literature; see Koenker and Bassett 
(1978). 

To understand regression quantile, it is helpful to cast the empirical quantile of 
the previous subsection as an estimation problem. For a given probability p, the 
pth quantile of {r;} is obtained by 


n 


ApS argming » wp(ri — B), 
i=l 


where w,(z) is defined by 


_ J pz if z => 0, 
wp(z) = | (p—1)z ifz <0. 


Regression quantile is a generalization of such an estimate. 
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To see the generalization, suppose that we have the linear regression 
r, = Bx; +a, (7.13) 


where £ is a k-dimensional vector of parameters and x, is a vector of predictors that 
are elements of F;_;. The conditional distribution of r; given F;_; is a translation 
of the distribution of a, because B’x; is known. Viewing the problem this way, 
Koenker and Bassett (1978) suggest estimating the conditional quantile x,|F;—1 of 
r; given F;_1 as 


£p|Fr1 = inf{B.x|R,(B,) = min}, (7.14) 


where “R,(B,) = min” means that B, is obtained by 


Bo = argming 2 Wt — p'x:); 


t=1 


where wp(-) is defined as before. A computer program to obtain such an estimated 
quantile can be found in Koenker and D’Orey (1987). The package quantreg of 
R performs quantile regression analysis. 


7.5 EXTREME VALUE THEORY 


In this section, we review some extreme value theory in the statistical literature. 
Denote the return of an asset, measured in a fixed time interval such as daily, 
by r;. Consider the collection of n returns, {r1, ..., rn}. The minimum return of 
the collection is ra), that is, the smallest order statistic, whereas the maximum 
return is rj), the maximum order statistic. Specifically, ra) = minj<j<,{rj;} and 
Fín) = Max ,<j<n{rj}. Following the literature and using the loss function in VaR 
calculation, we focus on properties of the maximum return rn). However, the theory 
discussed also applies to the minimum return of an asset over a given time period 
because properties of the minimum return can be obtained from those of the max- 
imum by a simple sign change. Specifically, we have ra) = — maxj<j<n{—rj} = 
SF i where rf = —r; with the superscript c denoting sign change. The minimum 
return is relevant to holding a long financial position. As before, we shall use neg- 
ative log returns, instead of the log returns, to perform VaR calculation for a long 
position. 


7.5.1 Review of Extreme Value Theory 


Assume that the returns r, are serially independent with a common cumulative 
distribution function F(x) and that the range of the return r; is [/,u]. For log 
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returns, we have / = —oo and u = ov. Then the CDF of rm), denoted by Fn, ,(x), 
is given by 


Fan) = Pr[r(n) < x] 


=Pr(ry < xX,72 < X,...,% <x) (by definition of maximum) 


= I] Pr(r; <x) (by independence) 
j=l 


= I] F(x) =[F(x)]". (7.15) 
j=l 


In practice, the CDF F(x) of r, is unknown and, hence, Fy.n(x) of rin) is unknown. 
However, as n increases to infinity, Fa n(x) becomes degenerated—namely, 
Fun(x) > 0 if x <u and Fy»(x) > 1 if x >u as n goes to infinity. This 
degenerated CDF has no practical value. Therefore, the extreme value theory is 
concerned with finding two sequences {f,,} and {a,}, where a, >0, such that the 
distribution of Fns) = (Fn) — Bn)/O@n converges to a nondegenerate distribution as 
n goes to infinity. The sequence {8,} is a location series and {a@,} is a series of 
scaling factors. Under the independent assumption, the limiting distribution of the 
normalized minimum Fn») is given by 


exp[—(l+éx)"'*] if € 40, 


exp[— exp(—x)] ifé =0, (7.16) 


F(x) = | 
for x < —1/é if E < 0 and for x > —1/é if § > 0, where the subscript * signifies 
the maximum. The case of € = 0 is taken as the limit when £ —> 0. The parameter 
& is referred to as the shape parameter that governs the tail behavior of the limiting 
distribution. The parameter œ = 1/& is called the tail index of the distribution. 

The limiting distribution in Eq. (7.16) is the generalized extreme value (GEV) 
distribution of Jenkinson (1955) for the maximum. It encompasses the three types 
of limiting distribution of Gnedenko (1943): 


e Type I: £ = 0, the Gumbel family. The CDF is 
F(x) = exp[— exp(—x)], —0 <x < 0. (7.17) 
e Type Il: £ > 0, the Fréchet family. The CDF is 


_ f expl-(d+&x)-5] ifx>-1/é, 
FQ) = | 0 otherwise. Cie) 
e Type II: € < 0, the Weibull family. The CDF here is 


_f expl-d+éx)"'4] ifx <-I/é, 
FG) = | 1 otherwise. 


344 EXTREME VALUES, QUANTILES, AND VALUE AT RISK 


Density 


-10 -5 0 5 10 


Figure 7.2 Probability density functions of extreme value distributions for maximum. Solid line is 
for Gumbel distribution, dotted line is for Weibull distribution with € = —0.5, and dashed line is for 
Fréchet distribution with £ = 0.9. 


Gnedenko (1943) gave necessary and sufficient conditions for the CDF F(x) of r; 
to be associated with one of the three types of limiting distribution. Briefly speak- 
ing, the tail behavior of F(x) determines the limiting distribution F,(x) of the 
maximum. The right tail of the distribution declines exponentially for the Gumbel 
family, by a power function for the Fréchet family, and is finite for the Weibull 
family (Figure 7.2). Readers are referred to Embrechts, Kuppelberg, and Mikosch 
(1997) for a comprehensive treatment of the extreme value theory. For risk man- 
agement, we are mainly interested in the Fréchet family, which includes stable 
and Student-r distributions. The Gumbel family consists of thin-tailed distributions 
such as normal and lognormal distributions. The probability density function (pdf) 
of the generalized limiting distribution in Eq. (7.16) can be obtained easily by 
differentiation: 


(1+ &x)-“5! exp[-(1 + €x)""4] if € #0, 


exp[—x — exp(—x)] if— —0, (7.19) 


fe = | 


where —co < x < œ for Ẹ = 0, and x < —1/é for Ẹ < 0, and x > —1/é for > 0. 

The aforementioned extreme value theory has two important implications. First, 
the tail behavior of the CDF F(x) of r;, not the specific distribution, determines 
the limiting distribution F(x) of the (normalized) maximum. Thus, the theory is 
generally applicable to a wide range of distributions for the return r;. The sequences 
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{8n} and {a,}, however, may depend on the CDF F(x). Second, Feller (1971, 
p. 279) shows that the tail index € does not depend on the time interval of 7. 
That is, the tail index (or equivalently the shape parameter) is invariant under time 
aggregation. This second feature of the limiting distribution becomes handy in the 
VaR calculation. 

The extreme value theory has been extended to serially dependent observa- 
tions {r;}/_, provided that the dependence is weak. Berman (1964) shows that the 
same form of the limiting extreme value distribution holds for stationary normal 
sequences provided that the autocorrelation function of r; is squared summable 
(i.e., Yi p? < œ), where p; is the lag-i autocorrelation function of r;. For fur- 
ther results concerning the effect of serial dependence on the extreme value theory, 
readers are referred to Leadbetter, Lindgren, and Rootzén (1983, Chapter 3). We 
shall discuss extremal index for a strictly stationary time series later in Section 7.8. 


7.5.2 Empirical Estimation 


The extreme value distribution contains three parameters—é, n, and a,. These 
parameters are referred to as the shape, location, and scale parameters, respec- 
tively. They can be estimated by using either parametric or nonparametric methods. 
We review some of the estimation methods. 

For a given sample, there is only a single minimum or maximum, and we cannot 
estimate the three parameters with only an extreme observation. Alternative ideas 
must be used. One of the ideas used in the literature is to divide the sample into 
subsamples and apply the extreme value theory to the subsamples. Assume that 
there are T returns {r Dp ı available. We divide the sample into g nonoverlapping 
subsamples each with n observations, assuming for simplicity that T = ng. In other 
words, we divide the data as 


{r1, ees nln ti ey tinl ntis eera 3nd IT(g—1)n+1> senatne 


and write the observed returns as rj,4;, where 1 < j <n and i = 0,...,g— 1. 
Note that each subsample corresponds to a subperiod of the data span. When n is 
sufficiently large, we hope that the extreme value theory applies to each subsample. 
In application, the choice of n can be guided by practical considerations. For 
example, for daily returns, n = 21 corresponds approximately to the number of 
trading days in a month and n = 63 denotes the number of trading days in a 
quarter. 

Let r,,; be the maximum of the ith subsample (i.e., rn, is the largest return of the 
ith subsample), where the subscript n is used to denote the size of the subsample. 
When n is sufficiently large, Xn, = (fn,i — Bn)/a, should follow an extreme value 
distribution, and the collection of subsample maxima {r,;|i = 1,..., g} can then 
be regarded as a sample of g observations from that extreme value distribution. 
Specifically, we define 


Fn, i = Max {ra-1n+j} i= 1, e.” 8. (7.20) 
I<j<n 
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The collection of subsample maxima {r,i} is the data we use to estimate the 
unknown parameters of the extreme value distribution. Clearly, the estimates 
obtained may depend on the choice of subperiod length n. 


Remark. When T is not a multiple of the subsample size n, several methods 
have been used to deal with this issue. First, one can allow the last subsample to 
have a smaller size. Second, one can ignore the first few observations so that each 
subsample has size n. 


The Parametric Approach 
Two parametric approaches are available. They are the maximum-likelihood and 
regression methods. 


Maximum-Likelihood Method 

Assuming that the subperiod maxima {r,,;} follow a generalized extreme value 
distribution such that the pdf of x; = (ni — Bn)/Qn is given in Eq. (7.19), we can 
obtain the pdf of r,,; by a simple transformation as 


are —(1+8n)/En = —1/En 
x |i a En (tn i e| exp |- (1 A. En (ni e) | if È p 0, 


an an 


fOrs 


1 fn i— Ên Tn i— Ên : _ 
+ exp| a exp ( ‘uimbn if &, = 0, 


where it is understood that 1 + &n (rn, — Bn)/Q@n > 0 if &, #0. The subscript n is 
added to the shape parameter & to signify that its estimate depends on the choice 
of n. Under the independence assumption, the likelihood function of the subperiod 
maxima is 


g 
Llf wis e.’ Tng lEn: Qn, Bn) = I] f(Uni)- 


i=l 


Nonlinear estimation procedures can then be used to obtain maximum -likelihood 
estimates of &,, 6,, and a,. These estimates are unbiased, asymptotically normal, 
and of minimum variance under proper assumptions. See Embrechts et al. (1997) 
and Coles (2001) for details. We apply this approach to some stock return series 
later. 


Regression Method 

This method assumes that {r}; Jai is a random sample from the generalized extreme 
value distribution in Eq. (7.16) and makes use of properties of order statistics; see 
Gumbel (1958). Denote the order statistics of the subperiod maxima coma a 1 as 


Tad) S nO) S++ S Tug): 
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Using properties of order statistics (e.g., Cox and Hinkley, 1974, p. 467), 
we have 


i 
E{F;[rno h = ——. Pimes 7.21 
{Ferno eA i g (7.21) 


For simplicity, we separate the discussion into two cases depending on the value 
of £. First, consider the case of € Æ 0. From Eq. (7.16), we have 


Lén 
F, [ray] = exp |- (1 je mo m — l , (1.22) 


Consequently, using Eqs. (7.21) and (7.22) and approximating expectation by an 
observed value, we have 


7 1/En 

l Trav) Z Bn ‘ 
—_ = — {1 ; a! erate ae 
gtl s| ( m Qn mA) l i á 


Taking natural logarithm twice, the prior equation gives 


i —1 Fanci) T Bn . 
ln | — In | ——— = — ln | 1 + ér —— |, tS hasp: 
gtl En Qn 


In practice, letting e; be the deviation between the previous two quantities and 
assuming that the series {e,} is not serially correlated, we have a regression setup 


inf-m(——)]=Fin(i L moh) a, i=1,...,g. (7.23) 
g+1 En On 


The least-squares estimates of £n, Sn, and a, can be obtained by minimizing the 
sum of squares of e;. 
When é, = 0, the regression setup reduces to 


ili ree i= 1 
n|—In = — fni — ei, ES yere 2 
gtl Qn s Xn 


The least-squares estimates are consistent but less efficient than the likelihood 
estimates. We use the likelihood estimates in this chapter. 
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The Nonparametric Approach 

The shape parameter € can be estimated using some nonparametric methods. We 
mention two such methods here. These two methods are proposed by Hill (1975) 
and Pickands (1975) and are referred to as the Hill estimator and Pickands estimator, 
respectively. Both estimators apply directly to the returns fro ,- Thus, there is no 
need to consider subsamples. Denote the order statistics of the sample as 


ray Sra Sc: Sra): 


Let q be a positive integer. The two estimators of € are defined as 


1 = — rT 
E (a) = — 1n |__| , a<T/A4, (7.24) 
In(2) I (T—2g+1) — V(T—4q+1) 


q 


1 
aD = 7 Y= Ln@r-itn) — nGa—q)]. (7.25) 


i=l 


where the argument (q) is used to emphasize that the estimators depend on q and the 
subscripts p and h denote Pickands and Hill estimators, respectively. The choice of 
q differs between Hill and Pickands estimators. It has been investigated by several 
researchers, but there is no general consensus on the best choice available. Dekkers 
and De Haan (1989) show that £, (q) is consistent if q increases at a properly chosen 
pace with the sample size T. In addition, ./q¢[&p»(q) — &] is asymptotically normal 
with mean zero and variance &7(275+! + 1) / [2(2® — 1) In(2)]’. The Hill estimator is 
applicable to the Fréchet distribution only, but it is more efficient than the Pickands 
estimator when applicable. Goldie and Smith (1987) show that ./g[&,(q) — &] is 
asymptotically normal with mean zero and variance £°. In practice, one may plot the 
Hill estimator £, (q) against q and find a proper q such that the estimate appears to 
be stable. The estimated tail index œ = 1/&,(q) can then be used to obtain extreme 
quantiles of the return series; see Zivot and Wang (2003). 


7.5.3 Application to Stock Returns 


We apply the extreme value theory to the daily log returns of IBM stock from July 
3, 1962, to December 31, 1998. The returns are measured in percentages, and the 
sample size is 9190 (i.e., T = 9190). Figure 7.3 shows the time plots of extreme 
daily log returns when the length of the subperiod is 21 days, which corresponds 
approximately to a month. The October 1987 crash is clearly seen from the plot. 
Excluding the 1987 crash, the range of extreme daily log returns is between 0.5 
and 13%. 

Table 7.1 summarizes some estimation results of the shape parameter € via 
the Hill estimator. Three choices of q are reported in the table, and the results 
are stable. To provide an overall picture of the performance of the Hill estimator, 
Figure 7.4 shows the scatterplots of the Hill estimator £, (q) and its pointwise 95% 
confidence interval against g. For both positive and negative extreme daily log 
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Figure 7.3 Maximum and minimum daily log returns of IBM stock when subperiod is 21 trading days. 
Data span is from July 3, 1962, to December 31, 1998: (a) positive returns and (b) negative returns. 


TABLE 7.1 Results of Hill Estimator for Daily Log Returns of IBM Stock from July 
3, 1962, to December 31, 19984 


q 190 200 210 
r 0.300(0.022) 0.299(0.021) 0.305(0.021) 
=r; 0.290(0.021) 0.292(0.021) 0.289(0.020) 


“Standard errors are in parentheses. 


returns, the estimator is stable except for cases when q is small. The estimated 
shape parameters are about 0.30 and are significantly different from zero at the 
asymptotic 5% level. The plots also indicate that the shape parameter € appears to 
be larger for the negative extremes, indicating that the daily log return may have a 
heavier left tail. Overall, the result indicates that the distribution of daily log returns 
of IBM stock belongs to the Fréchet family. The analysis thus rejects the normality 
assumption commonly used in practice. Such a conclusion is in agreement with 
that of Longin (1996), who used a U.S. stock market index series. R and S-Plus 
commands used to perform the analysis are given in the demonstration below. 
Next, we apply the maximum-likelihood method to estimate parameters of the 
generalized extreme value distribution for IBM daily log returns. Table 7.2 summa- 
rizes the estimation results for different choices of the length of subperiods ranging 
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Figure 7.4 Scatterplots of Hill estimator for daily log returns of IBM stock. Sample period is from 


July 3, 1962, to December 31, 1998: upper plot is for positive returns and lower one for negative 
returns. 


TABLE 7.2 Maximum-Likelihood Estimates of Extreme Value Distribution for Daily 
Log Returns of IBM Stock from July 3, 1962 to December 31, 19987 


Length of Subperiod Scale a, Location B, Shape Par. &,, 
Minimal Returns 

1 mon. (n = 21, g = 437) 0.823(0.035) 1.902(0.044) 0.197(0.036) 

1 qur (n = 63, g = 145) 0.945(0.077) 2.583(0.090) 0.335(0.076) 

6 mon. (n = 126, g = 72) 1.147(0.131) 3.141(0.153) 0.330(0.101) 

1 year (n = 252, g = 36) 1.542(0.242) 3.761(0.285) 0.322(0.127) 
Maximal Returns 

1 mon. (n = 21, g = 437) 0.931(0.039) 2.184(0.050) 0.168(0.036) 

1 qur (n = 63, g = 145) 1.157(0.087) 3.012(0.108) 0.217(0.066) 

6 mon. (n = 126, g = 72) 1.292(0.158) 3.471(0.181) 0.349(0.130) 

1 year (n = 252, g = 36) 1.624(0.271) 4.475(0.325) 0.264(0.186) 


“Standard errors are in parentheses. 
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from 1 month (n = 21) to 1 year (n = 252). From the table, we make the following 
observations: 


e Estimates of the location and scale parameters 6, and a, increase in modulus 
as n increases. This is expected as magnitudes of the subperiod minimum and 
maximum are nondecreasing functions of n. 


e Estimates of the shape parameter (or equivalently the tail index) are stable for 
the negative extremes when n > 63 and are approximately 0.33. 


e Estimates of the shape parameter are less stable for the positive extremes. The 
estimates are smaller in magnitude but remain significantly different from zero. 


e The results for n = 252 have higher variabilities as the number of subperiods 
g is relatively small. 


Again the conclusion obtained is similar to that of Longin (1996), who provided a 
good illustration of applying the extreme value theory to stock market returns. 

The results of Table 7.2 were obtained using a Fortran program developed by 
Richard Smith and modified by the author. The package evir of R performs similar 
estimation. S-Plus is also based on the evir package. I demonstrate below the 
commands used. Note that the package uses subgroup maxima in the estimation so 
that negative log returns are used for holding long financial positions. Furthermore, 
xi, sigma, mu in the package corresponds to (£n, Œn, By) of the table. The estimates 
obtained by R and S-Plus are close to those in Table 7.2. A source of minor 
difference is that in Table 7.2 I dropped some data points at the beginning when 
the sample size T is not a multiple of the subgroup size n. Consequently, results 
of the R package have one more subgroup than that of Table 7.2. 


R Demonstration for Extreme Value Analysis 
The series is daily IBM log returns from 1962 to 1998. The following output was 
edited: 


library (evir) 

help (hill) 
da=read.table("d-ibm6298.txt",header=T) 
ibm=log(da[,2]+1)*100 

nibm=-ibm 

par(mfcol=c(2,1)) <== Obtain plots 

hill (ibm, option=c ("xi"), end=500) 

hill (nibm, option=c ("xi") ,end=500) 

A simple R program to compute Hill estimate 
source ("Hill.R") 

Hill 

function (x, q) { 

# Compute the Hill estimate of the shape parameter. 
# x: data and q: the number of order statistics used. 
sx=sort (x) 

T=length (x) 


VV HV VV VV VV V— 
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ist=T-q 

y=log(sx[ist:T]) 

hill=sum(y[2:length(y)])/q 

hill=hill-y[1] 

sd=sqrt (hill*2/q) 

cat ("Hill estimate & std-err:",c(hill,sd),"\n") 
} 

> ml=Hill(ibm,190) 

Hill estimate & std-err: 0.3000144 0.02176533 
> ml=Hill (nibm, 190) 

Hill estimate & std-err: 0.2903796 0.02106635 


> ml=gev(nibm, block=21) 


> m1 
Sn.all 
[1] 9190 
$n 
[1] 438 
$data 
[1] 3.2884827 3.6186920 3.9936970 

Sblock 
[1] 21 
Spar.ests 

xi sigma mu 
0.1954537 0.8240286 1.9033817 
Spar.ses 

xi sigma mu 
0.03553259 0.03477151 0.04413856 
Svarcov 

[1] [,2] [a3] 


1] 1.262565e-03 -2.831235e-05 -0.0004336771 
2,] -2.831235e-05 1.209058e-03 0.0008477562 
3,] -4.336771e-04 8.477562e-04 0.0019482125 


> names (m1) 
1] "n.all" "n" "data" "block" "par.ests" 
6] "par.ses" "varcov" "converged" "nllh.final" 


> plot (m1) 

Make a plot selection (or 0 to exit): 
1: plot: Scatterplot of Residuals 

2: plot: QQplot of Residuals 
Selection: 1 


Define the residuals of a GEV distribution fit as 


ieee —1/En 
w; = (: ie f) 


n 
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Figure 7.5 Residual plots from fitting GEV distribution to daily negative IBM log returns, in percent- 
age, for data from July 3, 1962, to December 31, 1998, with subperiod length of 21 days. 


Using the pdf of the GEV distribution and transformation of variables, one can 
easily show that {w;} should form an iid random sample of exponentially distributed 
random variables if the fitted model is correctly specified. Figure 7.5 shows the 
residual plots of the GEV distribution fit to the daily negative IBM log returns 
with subperiod length of 21 days. The left panel gives the residuals and the right 
panel shows a quantile-to-quantile (QQ) plot against an exponential distribution. 
The plots indicate that the fit is reasonable. 


Remark. Besides evir, several other packages are also available in R to per- 
form extreme value analysis. They are evd, POT, and extRemes. 


7.6 EXTREME VALUE APPROACH TO VAR 


In this section, we discuss an approach to VaR calculation using the extreme 
value theory. The approach is similar to that of Longin (1999a,b), who pro- 
posed an eight-step procedure for the same purpose. We divide the discussion 
into two parts. The first part is concerned with parameter estimation using the 
method discussed in the previous subsections. The second part focuses on VaR 
calculation by relating the probabilities of interest associated with different time 
intervals. 
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Part I 

Assume that there are T observations of an asset return available in the sample 
period. We partition the sample period into g nonoverlapping subperiods of length 
n such that T = ng. If T = ng +m with 1 < m <n, then we delete the first m 
observations from the sample. The extreme value theory discussed in the previous 
section enables us to obtain estimates of the location, scale, and shape parameters 
Bn, æn, and &, for the subperiod maxima {r,,;}. Plugging the maximum-likelihood 
estimates into the CDF in Eq. (7.16) with x = (r — B,)/an, we can obtain the 
quantile of a given probability of the generalized extreme value distribution. Let 
p* be a small upper tail probability that indicates the potential loss and rž be 
the (1 — p*)th quantile of the subperiod maxima under the limiting generalized 
extreme value distribution. Then we have 


an 


rt —l/én ` 
en fit fn | if & £0, 
exp |- exp (i) ] if &, = 0, 


1-p*= 


where it is understood that 1+ én (rž — Bn)/on >0 for & #0. Rewriting this 
equation as 


pog gln 
— [! En( a | if En L 0, 
In(l — p*) = . 


— exp (- nih ) if En = 0, 


we obtain the quantile as 


i ahl 
ae a fi- [- ma - p»} } a We 


Bn — &n In[— Indl — p*)] if €&, = 0. 
In financial applications, the case of &, 4 0 is of major interest. 


Part II 
For a given upper tail probability p*, the quantile rž of Eq. (7.26) is the VaR 
based on the extreme value theory for the subperiod maximum. The next step is to 
make explicit the relationship between subperiod maxima and the observed return 
r; series. 

Because most asset returns are either serially uncorrelated or have weak serial 
correlations, we may use the relationship in Eq. (7.15) and obtain 

1— pt = P (fni < rt) = (P(r sry". (7.27) 

This relationship between probabilities allows us to obtain VaR for the original 
asset return series rp. More precisely, for a specified small upper probability p, 
the (1 — p)th quantile of 7; is rž if the upper tail probability p* of the subperiod 
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maximum is chosen based on Eq. (7.27), where P(r; < rž) = 1 — p. Consequently, 


for a given small upper tail probability p, the VaR of a financial position with log 
return r; is 


(1.28) 


war =| ^ #{1—[-nind-p)]*} ife 40 


Bn Xn In[ n In(1 p)] if En = 0, 


where n is the length of the subperiod. 


Summary 
We summarize the approach of applying the traditional extreme value theory to 
VaR calculation as follows: 


1. Select the length of the subperiod n and obtain subperiod maxima {rni}, 
i=1,...,g, where g = [T/A]. 

2. Obtain the maximum-likelihood estimates of Bn, &n, and én. 

3. Check the adequacy of the fitted extreme value model; see the next section 
for some methods of model checking. 

4. If the extreme value model is adequate, apply Eq. (7.28) to calculate VaR. 


Remark. Since we focus on loss function so that maxima of log returns are 
used in the derivation. Keep in mind that for a long financial position, the return 
series used in loss function is the negative log returns, not the traditional log 
returns. 


Example 7.6. Consider the daily log return, in percentage, of IBM stock from 
July 3, 1962, to December 31, 1998. From Table 7.2, we have @, = 0.945, Bn = 
2.583, and 2 = 0.335 for n = 63. Therefore, for the left-tail probability p = 0.01, 
the corresponding VaR is 


0.945 
R = 2.583 — —— {1 — [63 In(1 — 0.01) °° 
Va 583 aa] [—63 In(1 — 0.01)] } 


= 3.04969. 


Thus, for daily negative log returns of the stock, the upper 1% quantile is 3.04969. 
If one holds a long position on the stock worth $10 million, then the estimated VaR 
with probability 1% is $10,000,000 x 0.0304969 = $304, 969. If the probability is 
0.05, then the corresponding VaR is $166, 641. 

If we chose n = 21 (i.e., approximately 1 month), then @, = 0.823, Bn = 1.902, 
and Ê, = 0.197. The upper 1% quantile of the negative log returns based on the 
extreme value distribution is 


0.823 


VaR = 1.902 — ——~{1 — [—21 In(1 — 0.01)]~°:!9”} = 3.40013. 
0.197 
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Therefore, for a long position of $10,000,000, the corresponding 1-day horizon VaR 
is $340, 013 at the 1% risk level. If the probability is 0.05, then the corresponding 
VaR is $184, 127. In this particular case, the choice of n = 21 gives higher VaR 
values. 

It is somewhat surprising to see that the VaR values obtained in Example 7.6 
using the extreme value theory are smaller than those of Example 7.3 that uses a 
GARCH(1,1) model. In fact, the VaR values of Example 7.6 are even smaller than 
those based on the empirical quantile in Example 7.5. This is due in part to the 
choice of probability 0.05. If one chooses probability 0.001 = 0.1% and consid- 
ers the same financial position, then we have VaR = $546,641 for the Gaussian 
AR(2)—GARCH(1,1) model and VaR = $666,590 for the extreme value theory 
with n = 21. Furthermore, the VaR obtained here via the traditional extreme value 
theory may not be adequate because the independent assumption of daily log 
returns is often rejected by statistical testings. Finally, the use of subperiod max- 
ima overlooks the fact of volatility clustering in the daily log returns. The new 
approach of extreme value theory discussed in the next section overcomes these 
weaknesses. 


Remark. As shown by the results of Example 7.6, the VaR calculation based 
on the traditional extreme value theory depends on the choice of n, which is the 
length of subperiods. For the limiting extreme value distribution to hold, one would 
prefer a large n. But a larger n means a smaller g when the sample size T is fixed, 
where g is the effective sample size used in estimating the three parameters œn, Bn, 
and &,. Therefore, some compromise between the choices of n and g is needed. A 
proper choice may depend on the returns of the asset under study. We recommend 
that one should check the stability of the resulting VaR in applying the traditional 
extreme value theory. 


7.6.1 Discussion 


We have applied various methods of VaR calculation to the daily log returns of 
IBM stock for a long position of $10 million. Consider the VaR of the position for 
the next trading day. If the probability is 5%, which means that with probability 
0.95 the loss will be less than or equal to the VaR for the next trading day, then 
the results obtained are 


1. $302, 500 for the RiskMetrics 

2. $287, 200 for a Gaussian AR(2)-GARCH(1,1) model 

3. $283, 520 for an AR(2)-GARCH(1,1) model with a standardized Student-t 
distribution with 5 degrees of freedom 

4. $216, 030 for using the empirical quantile 

5. $184, 127 for applying the traditional extreme value theory using monthly 
minima (i.e., subperiod length n = 21) of the log returns (or maxima of the 
negative log returns) 
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If the probability is 1%, then the VaR is 


1. $426, 500 for the RiskMetrics 
2. $409, 738 for a Gaussian AR(2)—GARCH(1,1) model 


3. $475, 943 for an AR(2)-GARCH(1,1) model with a standardized Student-t 
distribution with 5 degrees of freedom 


4. $365, 709 for using the empirical quantile 


5. $340, 013 for applying the traditional extreme value theory using monthly 
minima (i.e., subperiod length n = 21) 


If the probability is 0.1%, then the VaR becomes 


1. $566, 443 for the RiskMetrics 
2. $546, 641 for a Gaussian AR(2)—GARCH(1,1) model 


3. $836, 341 for an AR(2)-GARCH(1,1) model with a standardized Student-t 
distribution with 5 degrees of freedom 


4. $780, 712 for using the empirical quantile 


5. $666, 590 for applying the traditional extreme value theory using monthly 
minima (i.e., subperiod length n = 21) 


There are substantial differences among different approaches. This is not sur- 
prising because there exists substantial uncertainty in estimating tail behavior of a 
statistical distribution. Since there is no true VaR available to compare the accuracy 
of different approaches, we recommend that one applies several methods to gain 
insight into the range of VaR. 

The choice of tail probability also plays an important role in VaR calculation. For 
the daily IBM stock returns, the sample size is 9190 so that the empirical quantiles 
of 5 and 1% are decent estimates of the quantiles of the return distribution. In 
this case, we can treat the results based on empirical quantiles as conservative 
estimates of the true VaR (i.e., lower bounds). In this view, the approach based 
on the traditional extreme value theory seems to underestimate the VaR for the 
daily log returns of IBM stock. The conditional approach of extreme value theory 
discussed in the next section overcomes this weakness. 

When the tail probability is small (e.g., 0.1%), the empirical quantile is a less 
reliable estimate of the true quantile. The VaR based on empirical quantiles can 
no longer serve as a lower bound of the true VaR. Finally, the earlier results show 
clearly the effects of using a heavy-tail distribution in VaR calculation when the 
tail probability is small. The VaR based on either a Student-t distribution with 5 
degrees of freedom or the extreme value distribution is greater than that based on 
the normal assumption when the probability is 0.1%. 


7.6.2 Multiperiod VaR 


The square root of time rule of the RiskMetrics methodology becomes a special 
case under the extreme value theory. The proper relationship between ¢-day and 
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1-day horizons is 
VaR(¢) = £!/* VaR = £E VaR, 


where a is the tail index and & is the shape parameter of the extreme value distri- 
bution; see Danielsson and de Vries (1997a). This relationship is referred to as the 
a root of time rule. Here œ = 1/&, not the scale parameter œn. 

For illustration, consider the daily log returns of IBM stock in Example 7.6. If 
we use p = 0.01 and the results of n = 63, then for a 30-day horizon we have 


VaR(30) = (30)°>*° VaR = 3.125 x $304,969 = $952,997. 


Because £9775 < 0-5, the œ root of time rule produces lower ¢-day horizon VaR 


than the square root of time rule does. 


7.6.3 Return Level 


Another risk measure based on the extreme values of subperiods is the return level. 
The g n-subperiod return level, Ln,g, is defined as the level that is exceeded in one 
out of every g subperiods of length n. That is, 


1 
Phi > Lag) = 3 
& 


where r;,,; denotes subperiod maximum. The subperiod in which the return level 
is exceeded is called a stress period. If the subperiod length n is sufficiently large 
so that normalized r,; follows the GEV distribution, then the return level is 


Qn iy 
ent fi- ali} 


provided that &, #0. Note that this is precisely the quantile of extreme value 
distribution given in Eq. (7.26) with tail probability p* = 1/g, even though we 
write it in a slightly different way. Thus, return level applies to the subperiod 
maximum, not to the underlying returns. This marks the difference between VaR 
and return level. 

For the daily negative IBM log returns with subperiod length of 21 days, we can 
use the fitted model to obtain the return level for 12 such subperiods (i.e., g = 12). 
The return level is 4.4835%. 


R and S-Plus Commands for Obtaining Return Level 


> ml=gev (nibm,block=21) 

# S-Plus output 

> rl.21.12=rlevel.gev(m1, k.blocks=12, type='profile') 
> class(r1.21.12) 

[1] "list" 
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> names (rl.21.12) 

1] "Range" "rlevel" 

> rl.21.12$rlevel 

1] 4.483506 

R output 

> r1.21.12=rlevel.gev(m1,k.blocks=12) 
> rl 2112 

1] 4.177923 4.481976 4.858102 


In the prior demonstration, the number of subperiods is denoted by k.blocks 
and the subcommand, type = ‘profile’, produces a plot of the profile log- 
likelihood confidence interval for the return level. The plot is not shown here. 


7.7 NEW APPROACH BASED ON THE EXTREME VALUE THEORY 


The aforementioned approach to VaR calculation using the extreme value theory 
encounters some difficulties. First, the choice of subperiod length n is not clearly 
defined. Second, the approach is unconditional and, hence, does not take into con- 
sideration effects of other explanatory variables. To overcome these difficulties, a 
modern approach to extreme value theory has been proposed in the statistical liter- 
ature; see Davison and Smith (1990) and Smith (1989). Instead of focusing on the 
extremes (maximum or minimum), the new approach focuses on exceedances of 
the measurement over some high threshold and the times at which the exceedances 
occur. Thus, this new approach is also referred to as peaks over thresholds (POT). 
For illustration, consider the daily returns of IBM stock used in this chapter and 
a long position on the stock. Denote the negative daily log return by 7,. Let 7 
be a prespecified high threshold. We may choose 7 = 2.5%. Suppose that the ith 
exceedance occurs at day t; (i.e., ry < 7). Then the new approach focuses on the 
data (ti, r — n). Here r, — n is the exceedance over the threshold n and t; is the 
time at which the ith exceedance occurs. Similarly, for a short position, we may 
choose 7 = 2% and focus on the data (¢;,7r;, — n) for which r, > n. 

In practice, the occurrence times {t;} provide useful information about the inten- 
sity of the occurrence of important “rare events” (e.g., less than the threshold 7 for 
a long position). A cluster of t; indicates a period of large market declines. The 
exceeding amount (or exceedance) r;, — 7 is also of importance as it provides the 
actual quantity of interest. 

Based on the prior introduction, the new approach does not require the choice 
of a subperiod length n, but it requires the specification of threshold 7. Different 
choices of the threshold 7 lead to different estimates of the shape parameter k 
(and hence the tail index 1/&). In the literature, some researchers believe that 
the choice of 7 is a statistical problem as well as a financial one, and it cannot 
be determined based purely on statistical theory. For example, different financial 
institutions (or investors) have different risk tolerances. As such, they may select 
different thresholds even for an identical financial position. For the daily log returns 
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of IBM stock considered in this chapter, the calculated VaR is not sensitive to the 
choice of 7. 

The choice of threshold 7 also depends on the observed log returns. For a 
stable return series, 7 = 2.5% may fare well for a long position. For a volatile 
return series (e.g., daily returns of a dot-com stock), 7 may be as high as 10%. 
Limited experience shows that 7 can be chosen so that the number of exceedances 
is sufficiently large (e.g., about 5% of the sample). For a more formal study on the 
choice of 7, see Danielsson and de Vries (1997b). 


7.7.1 Statistical Theory 


Again consider the log return r; of an asset. Suppose that the ith exceedance 
occurs at t;. Focusing on the exceedance r; — 7 and exceeding time t; results in a 
fundamental change in statistical thinking. Instead of using the marginal distribution 
(e.g., the limiting distribution of the minimum or maximum), the new approach 
employs a conditional distribution to handle the magnitude of exceedance given 
that the measurement exceeds a threshold. The chance of exceeding the threshold 
is governed by a probability law. In other words, the new approach considers the 
conditional distribution of x = r; — 7 given r; < n for a long position. Occurrence 
of the event {r; < n} follows a point process (e.g., a Poisson process). See Section 
6.9 for the definition of a Poisson process. In particular, if the intensity parameter 
à of the process is time invariant, then the Poisson process is homogeneous. If À is 
time variant, then the process is nonhomogeneous. The concept of Poisson process 
can be generalized to the multivariate case. 

The basic theory of the new approach is to consider the conditional distribution 
of r = x + n given r > nņ for the limiting distribution of the maximum given in 
Eq. (7.16). Since there is no need to choose the subperiod length n, we do not use 
it as a subscript of the parameters. Then the conditional distribution of r < x + 7 
given r > 7 is 


Prasrsx+y)_ Pre sx+n)—Pr? <n) 


P. <= — = 
E a an Pre >n) 1 Pre =n) 


. (7.29) 


Using the CDF F,(-) of Eq. (7.16) and the approximation e~” ~ | — y and after 
some algebra, we obtain that 


F,(x + n) — F,(n) 
1 — F,(y) 


` —T/é -1/é 
exp — [1 + Sete] |- spf- [1+ EA] | 


—1/ 
{= exp {- [! a —| | 


ee E ae 7.30 
f e di 


Prr <x+nr >n) = 


Q 
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where x > 0 and 1+ (n — B)/a>0. As is seen later, this approximation makes 
explicit the connection of the new approach to the traditional extreme value theory. 
The case of £ = 0 is taken as the limit of E€ — 0 so that 


Prov <x+n|r>n) © 1 — exp(—x/a). 


The distribution with cumulative distribution function 


1— | + S for £ £0 
Gi yma) = v i (7.31) 
1 —exp[—x/W(y)] for & = 0, 


where y(n) >0, x > 0 when & > 0, and 0 < x < —wW(n)/& when & < 0, is called 
the generalized Pareto distribution (GPD). Thus, the result of Eq. (7.30) shows that 
the conditional distribution of r given r >n is well approximated by a GPD with 
parameters € and Y(n) = œ + &(n — B). See Embrechts et al. (1997) for further 
information. An important property of the GPD is as follows. Suppose that the 
excess distribution of r given a threshold ņo is a GPD with shape parameter & 
and scale parameter W(no). Then, for an arbitrary threshold n > no, the excess 
distribution over the threshold 7 is also a GPD with shape parameter € and scale 
parameter Y(n) = Y (no) + E(N — no). 

When & = 0, the GPD in Eq. (7.31) reduces to an exponential distribution. This 
result motivates the use of a QQ plot of excess returns over a threshold against 
exponential distribution to infer the tail behavior of the returns. If § = 0, then the 
QQ plot should be linear. Figure 7.6(a) shows the QQ plot of daily negative IBM 
log returns used in this chapter with threshold 0.025. The nonlinear feature of the 
plot clearly shows that the left tail of the daily IBM log returns is heavier than that 
of a normal distribution, that is, € Æ 0. 


R and S-Plus Commands Used to Produce Figure 7.6 


> par(mfcol=c(2,1)) 

qplot (-ibm, threshold=0.025,main='Negative daily IBM 
log returns’ ) 

meplot (-ibm) 

> title(main=’Mean excess plot’) 


Vv 


Vv 


7.7.2 Mean Excess Function 


Given a high threshold no, suppose that the excess r — no follows a GPD with 
parameter € and W (no), where 0 < £ < 1. Then the mean excess over the threshold 
No 18 


Yo) 


E(r — nolr > no) = fae 


362 EXTREME VALUES, QUANTILES, AND VALUE AT RISK 


Exponential quantiles 
0123456 


0.05 0.10 0.15 0.20 0.25 
Ordered data 
(a) 
=] ` > 
E T pas 
g > e 
5 9 WR A 
g oS p” 
oO ee 
= fa l 
>] 
o 
-0.10 -0.05 0.0 0.05 0.10 
Threshold 


(b) 


Figure 7.6 Plots for daily negative IBM log returns from July 3, 1962, to December 31, 1998. (a) 
QQ plot of excess returns over threshold 2.5% and (b) mean excess plot. 


For any 7 > no, define the mean excess function e(n) as 


Y (No) > E(n a No) : 


e(n) = E(r — n|r >n) = 1E 


In other words, for any y > 0, 


Y(n) + Ey 
1-€ ` 
Thus, for a fixed €, the mean excess function is a linear function of y = n — no. 


This result leads to a simple graphical method to infer the appropriate threshold 
value 7, for the GPD. Define the empirical mean excess function as 


e(jo + y) = E[r — (no + yr > no + y] = 


Ny 


1 
er(n) = Cu — (7.32) 


1 i=] 


where N, is the number of returns that exceed ņ and r, are the values of the 
corresponding returns. See the next subsection for more information on the notation. 
The scatterplot of er (n) against 7 is called the mean excess plot, which should be 
linear in n for n > no under the GPD. The plot is also called mean residual life plot. 
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Figure 7.6(b) shows the mean excess plot of the daily negative IBM log returns. It 
shows that, among others, a threshold of about 3% is reasonable for the negative 
return series. In the evir package of R and S-Plus, the command for mean excess 
plot is meplot. 


7.7.3 New Approach to Modeling Extreme Values 


Using the statistical result in Eq. (7.30) and considering jointly the exceedances 
and exceeding times, Smith (1989) proposes a two-dimensional Poisson process 
to model (¢;,r;,). This approach was used by Tsay (1999) to study VaR in risk 
management. We follow the same approach. 

Assume that the baseline time interval is D, which is typically a year. In the 
United States, D = 252 is used as there are typically 252 trading days in a year. 
Let t be the time interval of the data points (e.g., daily) and denote the data span by 
t= 1,2,...,7, where T is the total number of data points. For a given threshold 
n, the exceeding times over the threshold are denoted by {t;, i= 1,..., N,} and 
the observed log return at t; is r. Consequently, we focus on modeling {(t;, r; )} 
for i = 1,..., Ny, where N, depends on the threshold n. 

The new approach to applying the extreme value theory is to postulate that 
the exceeding times and the associated returns [i.e., (¢;,1;,)] jointly form a two- 
dimensional Poisson process with intensity measure given by 


Di; 
A[(D2, D1) x (r, œ0)] = SG Es B), (7.33) 


where 


_ -1/é 
S(r:€, 0, B) = [1+ ad 


+ 


0< Di < D2. <T,r>n, a>0, B, and & are parameters, and the notation [x]+ 
is defined as [x], = max(x, 0). This intensity measure says that the occurrence of 
exceeding the threshold is proportional to the length of the time interval [D,, D2] 
and the probability is governed by a survival function similar to the exponent of 
the CDF F,(r) in Eq. (7.16). A survival function of a random variable X is defined 
as S(x) = Pr(X > x) = 1 — Pr(X < x) = 1 — CDF(x). When & = 0, the intensity 
measure is taken as the limit of £ — 0; that is, 


a 


A[(D2, Dı) x (r, co)] = Pa- exp [<] . 


In Eq. (7.33), the length of time interval is measured with respect to the baseline 
interval D. 


364 EXTREME VALUES, QUANTILES, AND VALUE AT RISK 


The idea of using the intensity measure in Eq. (7.33) becomes clear when one 
considers its implied conditional probability of r = x + n given r > 7 over the time 
interval [0, D], where x > 0, 


MOD xatm. [Isat A hy Ex i 
ALO, D) x (n,00)] L 1+.E— B)/a “Latin —p)l 


which is precisely the survival function of the conditional distribution given in 
Eq. (7.30). This survival function is obtained from the extreme limiting distribution 
for maximum in Eq. (7.16). We use survival function here because it denotes the 
probability of exceedance. 

The relationship between the limiting extreme value distribution in Eq. (7.16) 
and the intensity measure in Eq. (7.33) directly connects the new approach of 
extreme value theory to the traditional one. 

Mathematically, the intensity measure in Eq. (7.33) can be written as an integral 
of an intensity function: 


D [e0] 
Al(Dz, D1) x (r,00)1= f f) A(t, z: ta B) dedt, 
Dı 


r 


where the intensity function A(t, z; €, a, £) is defined as 


1 
A(t, z; E, a, B) = pee E, a, B), (1.34) 
where 
say tee 
11+ e] tE £0, 
g(z; €,a, B) = 
4 exp [==] if € =0. 


Using the results of a Poisson process, we can write down the likelihood function 
for the observed exceeding times and their corresponding returns {(¢;, r )} over the 
two-dimensional space [0, T] x (7, 00) as 


Ny 


1 T 
LE, a, B) = | [ [580E B) exp | —T Strife) |. (7.35) 


i=l 


The parameters £, œ, and # can then be estimated by maximizing the logarithm of 
this likelihood function. Since the scale parameter œ is nonnegative, we use In(œ) 
in the estimation. 


Example 7.7. Consider again the daily log returns of IBM stock from July 3, 
1962, to December 31, 1998. There are 9190 daily returns. Table 7.3 gives some 
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TABLE 7.3 Estimation Results of a Two-Dimensional Homogeneous Poisson Model 
for Daily Negative Log Returns of IBM Stock from July 3, 1962 to December 31, 
1998? 


Thr. Exc. Shape Parameter & Log(Scale) In(@) Location 8 
Original Log Returns 
3.0% 175 0.30697(0.09015) 0.30699(0.12380) 4.69204(0.19058) 
2.5% 310 0.26418(0.06501) 0.31529(0.11277) 4.74062(0.18041) 
2.0% 554 0.1875 1(0.04394) 0.27655(0.09867) 4.81003(0.17209) 
Removing the Sample Mean 
3.0% 184 0.305 16(0.08824) 0.30807(0.12395) 4.73804(0.19151) 
2.5% 334 0.28179(0.06737) 0.31968(0.12065) 4.76808(0.18533) 
2.0% 590 0.19260(0.04357) 0.27917(0.09913) 4.84859(0.17255) 


“The baseline time interval is 252 (i.e., 1 year). The numbers in parentheses are standard errors, where 
Thr. and Exc. stand for threshold and the number of exceedings. 


estimation results of the parameters £, œ, and 6 for three choices of the threshold 
when the negative series {—7;} is used. As mentioned before, we use the negative 
series {—r;}, instead of {r;} because we focus on holding a long financial position. 
The table also shows the number of exceeding times for a given threshold. It is seen 
that the chance of dropping 2.5% or more in a day for IBM stock occurred with 
probability 310/9190 ~ 3.4%. Because the sample mean of IBM stock returns 
is not zero, we also consider the case when the sample mean is removed from 
the original daily log returns. From the table, removing the sample mean has 
little impact on the parameter estimates. These parameter estimates are used next 
to calculate VaR, keeping in mind that in a real application one needs to check 
carefully the adequacy of a fitted Poisson model. We discuss methods of model 
checking in the next section. 


7.7.4 VaR Calculation Based on the New Approach 


As shown in Eq. (7.30), the two-dimensional Poisson process model used, which 
employs the intensity measure in Eq. (7.33), has the same parameters as those 
of the extreme value distribution in Eq. (7.16). Therefore, one can use the same 
formula as that of Eq. (7.28) to calculate VaR of the new approach. More specifi- 
cally, for a given upper tail probability p, the (1 — p)th quantile of the log return 
r, is 


B efi [-D ma rp] *| if E £0, 
B — aln[—D In(l — p)] if € =0, 


(7.36) 


VaR = | 


where D is the baseline time interval used in estimation. In the United States, one 
typically uses D = 252, which is approximately the number of trading days in a 
year. 
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Example 7.8. Consider again the case of holding a long position of IBM stock 
valued at $10 million. We use the estimation results of Table 7.3 to calculate 1-day 
horizon VaR for the tail probabilities of 0.05 and 0.01. 


e Case I: Use the original daily log returns. The three choices of threshold 7 
result in the following VaR values: 
1. n = 3.0%: VaR(5%) = $228,239, VaR(1%) = $359.303. 
2. n = 2.5%: VaR(5%) = $219,106, VaR(1%) = $361,119. 
3. n = 2.0%: VaR(5%) = $212,981, VaR(1%) = $368.552. 
e Case Il: The sample mean of the daily log returns is removed. The three 
choices of threshold 7 result in the following VaR values: 
1. n = 3.0%: VaR(5%) = $232,094, VaR(1%) = $363,697. 
2. n = 2.5%: VaR(5%) = $225,782, VaR(1%) = $364,254. 
3. n = 2.0%: VaR(5%) = $217,740, VaR(1%) = $372,372. 


As expected, removing the sample mean, which is positive, slightly increases the 
VaR. However, the VaR is rather stable among the three threshold values used. In 
practice, we recommend that one removes the sample mean first before applying 
this new approach to VaR calculation. 


Discussion. Compared with the VaR of Example 7.6 that uses the traditional 
extreme value theory, the new approach provides a more stable VaR calcula- 
tion. The traditional approach is rather sensitive to the choice of the subperiod 
length n. 


The command pot of the R package evir can be used to perform the estimation 
of the POT model. We demonstrate it below using the negative log returns of IBM 
stock. As expected, the results are very close to those obtained before. 


R Demonstration Using POT Command 


> library (evir) 
> m3=pot (nibm,0.025) 
> m3 
$n 
[1] 9190 
Speriod 
[1] 1 9190 
Sdata 
[1] 0.03288483 0.02648772 0.02817316 ..... 
Sspan 
[1] 9189 
Sthreshold 
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[1] 0.025 
Sp.less.thresh 
[1] 0.9662677 
Sn.exceed 
[1] 310 
Spar.ests 
XL sigma mu beta 
0.264078835 0.003182365 0.007557534 0.007788551 
Spar.ses 
xi sigma mu 

0.0229175739 0.0001808472 0.0007675515 
Svarcov 

[,1] [,2] [,3] 
[1,] 5.252152e-04 -2.873160e-06 -6.970497e-07 
[2,] -2.873160e-06 3.270571e-08 -7.907532e-08 
[3,] -6.970497e-07 -7.907532e-08 5.891353e-07 
Sintensity Sintensity function of exceeding the threshold 
[1] 0.03373599 
> plot (m3) % model checking 
Make a plot selection (or 0 to exit): 


1 


1: plot: Point Process of Exceedances 
2: plot: Scatterplot of Gaps 

3: plot: Qplot of Gaps 

4: plot: ACF of Gaps 

5: plot: Scatterplot of Residuals 

6: plot: Qplot of Residuals 

7: plot: ACF of Residuals 

8: plot: Go to GPD Plots 


Selection: 


> riskmeasures (m3,c(0.95,0.99,0.999) ) 
p quantile sfall 

[1,] 0.950 0.02208860 0.03162728 

[2,] 0.990 0.03616686 0.05075740 

[3,] 0.999 0.07019419 0.09699513 


7.7.5 Alternative Parameterization 


As mentioned before, for a given threshold n, the GPD can also be parameterized 
by the shape parameter € and the scale parameter w(71) = a+ &(n — B). This 
is the parameterization used in the evir package of R and S-Plus. Specifically, 
(xi,beta) of R and S-Plus corresponds to [&, w(7)] of this chapter. The command 
for estimating a GPD model in R and S-Plus is gpd. The output format for S-Plus 
is slightly different from that of R. For illustration, consider the daily negative IBM 
log return series from 1962 to 1998. The results of R are given below. 
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R Demonstration 
Data are negative IBM log returns. The following output was edited: 


> library (evir) 
> mgpd=gpd (nibm, threshold=0.025) 
> names (mgpd) 


ELD a a "data" "threshold" "p.less.thresh" 
[5] "n.exceed" "method" "par.ests" "par.ses" 
[9] "varcov" "information" "converged" "nllh.final" 
> mgpd 
$n 
[1] 9190 
Sdata 
[1] 0.03288483 0.02648772 0.02817316 0.03618692 
Sthreshold 
1] 0.025 


Sp.less.thresh SPercentage of data below the threshold. 
1] 0.9662677 


fo) 


Sn.exceed % Number of exceedances 


tI 320 
Smethod 
LI “mil 
Spar.ests 
xi beta 
0.264184649 0.007786063 
Spar.ses 
xi beta 
0.0662137508 0.0006427826 
Svarcov 
[,1] [,2] 
[1,] 4.384261e-03 -2.461142e-05 
[2,] -2.461142e-05 4.131694e-07 
> par(mfcol=c(2,2)) Plots for residual analysis 


> plot (mgpd) 


Make a plot selection (or 0 to exit): 

1: plot: Excess Distribution 

2: plot: Tail of Underlying Distribution 
3: plot: Scatterplot of Residuals 

4: plot: QQplot of Residuals 

Selection: 


Note that the results are very close to those in Table 7.3, where percentage log 
returns are used. The estimates of € and y(n) are 0.26418 and a+ &(n — £) = 
exp(0.31529) + (0.26418) (2.5 — 4.7406) = 0.77873, respectively, in Table 7.3. In 
terms of log returns, the estimate of y(n) is 0.007787, which is the same as the R 
and S-Plus estimate. 
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Figure 7.7 Diagnostic plots for GPD fit to daily negative log returns of IBM stock from July 3, 1962, 
to December 31, 1998. 


Figure 7.7 shows the diagnostic plots for the GPD fit to the daily negative log 
returns of IBM stock. The QQ plot (lower right panel) and the tail probability 
estimate (in log scale and in the lower left panel) show some minor deviation from 
a straight line, indicating further improvement is possible. 

From the conditional distributions in Eqs. (7.29) and (7.30) and the GPD in Eq. 
(7.31), we have 


F(y) — Fn) 
TF X Suwon, 
where y= x +7 with x > 0. If we estimate the CDF F(n) of the returns by the 
empirical CDF, then 


p 
n) = T’ 


where N, is the number of exceedances of the threshold 7 and T is the sample 
size. Consequently, by Eq. (7.31), 
F(y) = Fn) + CWU — F@)] 


Ny Ey- n) ai 
aa E eae R 
T | Eea | 
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This leads to an alternative estimate of the quantile of F (y) for use in VaR calcu- 
lation. Specifically, for a small upper tail probability p, let q = 1 — p. Then, by 
solving for y, we can estimate the qth quantile of F(y), denoted by VaR,, by 


E -4 
VaR; = n — ex i z Fe = o| ; (7.37) 
n 


where, as before, 7 is the threshold, T is the sample size, N, is the number of 
exceedances, and Y(n) and £ are the scale and shape parameters of the GPD 
distribution. This method to VaR calculation is used in R and S-Plus. 

As mentioned before in Section 7.2.3, expected shortfall (ES) associated with a 
given VaR is a useful risk measure. It is defined as the expected loss given that the 
VaR is exceeded. For generalized Pareto distribution, ES assumes a simple form. 
Specifically, for a given tail probability p, let q = 1 — p and denote the value at 
risk by VaR,. Then, the expected shortfall is defined by 


ES, = E(r|r > VaR,) = VaR, + E(r — VaR,|r > VaR,). (7.38) 


Using properties of the GPD, it can be shown that 


Y(n) + &(VaR, — n) 


E(r — VaR,|r > VaR,) = i 


’ 


provided that 0 < € < 1. Consequently, we have 


o VaR, yn) —&n 


ES, 


To illustrate the new method to VaR and ES calculations, we again use the daily 
negative log returns of IBM stock with threshold 2.5%. In the evir package of 
R and S-Plus, the command to compute VaR and ES via the peak over threshold 
method is riskmeasures: 


> riskmeasures (mgpd,c(0.95,0.99,0.999) ) 
p quantile sfall 

[1,] 0.950 002208959 0.03162619 

[2,] 0.990 0.03616405 0.05075390 

[3,] 0.999 0.07018944 0.09699565 


From the output, the VaR values for the financial position of $10 million are 
$220, 889 and $361, 661, respectively, for tail probability of 0.05 and 0.01. These 
two values are rather close to those given in Example 7.8 that are based on the 
method of the previous section. The expected shortfalls for the financial position 
are $316, 272 and $507, 576, respectively, for tail probability of 0.05 and 0.01. 
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7.7.6 Use of Explanatory Variables 


The two-dimensional Poisson process model discussed earlier is homogeneous 
because the three parameters £, œ, and 6 are constant over time. In practice, such 
a model may not be adequate. Furthermore, some explanatory variables are often 
available that may influence the behavior of the log returns 7;. A nice feature of 
the new extreme value theory approach to VaR calculation is that it can easily 
take explanatory variables into consideration. We discuss such a framework in this 
section. In addition, we also discuss methods that can be used to check the adequacy 
of a fitted two-dimensional Poisson process model. 

Suppose that x; = (x1;,..., Xy;)’ is a vector of v explanatory variables that are 
available prior to time t. For asset returns, the volatility o? of r, discussed in 
Chapter 3 is an example of explanatory variables. Another example of explanatory 
variables in the U.S. equity markets is an indicator variable denoting the meetings 
of the Federal Open Market Committee. A simple way to make use of explanatory 
variables is to postulate that the three parameters £, œ, and 6 are time varying and 
are linear functions of the explanatory variables. Specifically, when explanatory 
variables x, are available, we assume that 


& = Yo + yx te: + YX = Yo + V'X;, 
In(@) = ôo + O1xy + +++ + Oy xy, = ôo + 8X, (7.39) 
By = Oo FOX, Hes FOyXy, = bo + O'X;. 


If y = 0, then the shape parameter & = yo, which is time invariant. Thus, testing the 
significance of y can provide information about the contribution of the explanatory 
variables to the shape parameter. Similar methods apply to the scale and location 
parameters. In Eq. (7.39), we use the same explanatory variables for all three 
parameters &,, In(a;), and 6;. In an application, different explanatory variables may 
be used for different parameters. 

When the three parameters of the extreme value distribution are time varying, 
we have an inhomogeneous Poisson process. The intensity measure becomes 


= —1/ér 
wt : r>n. (7.40) 


Dz — Dı 
A[(D, D2) x (r, 00)) = 2S [i ie 


+ 
The likelihood function of the exceeding times and returns {(t;, 7;,)} becomes 


Ny 


= Ti, Er, Qr, By; exp SÒN; Er, Qt, , 
D ti ti fi ti D o t t t 


i=l 


which reduces to 


Nn T 
1 1 
L= | | [| 580r Ens dns By) | exp |-5 XO SO &, o, s| (7.41) 


i=l t=1 
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if one assumes that the parameters &,, œ+, and 6; are constant within each trading 
day, where g(z; &, œr, By) and S(; &;, œr, By) are given in Eqs. (7.34) and (7.33), 
respectively. For given observations {r;, x;|t = 1, ..., T}, the baseline time interval 
D, and the threshold n, the parameters in Eq. (7.39) can be estimated by maximizing 
the logarithm of the likelihood function in Eq. (7.41). Again we use In(q@;) to satisfy 
the positive constraint of œ+. 


Remark. The parameterization in Eq. (7.39) is similar to that of the volatility 
models of Chapter 3 in the sense that the three parameters are exact functions of the 
available information at time ¢. Other functions can be used if necessary. 


7.7.7 Model Checking 


Checking an entertained two-dimensional Poisson process model for exceedance 
times and excesses involves examining three key features of the model. The 
first feature is to verify the adequacy of the exceedance rate, the second 
feature is to examine the distribution of exceedances, and the final feature is 
to check the independence assumption of the model. We discuss briefly some 
Statistics that are useful for checking these three features. These statistics are 
based on some basic statistical theory concerning distributions and stochastic 
processes. 


Exceedance Rate 

A fundamental property of univariate Poisson processes is that the time durations 
between two consecutive events are independent and exponentially distributed. To 
exploit a similar property for checking a two-dimensional process model, Smith 
and Shively (1995) propose examining the time durations between consecutive 
exceedances. If the two-dimensional Poisson process model is appropriate for 
the exceedance times and excesses, the time duration between the ith and (i — 1)th 
exceedances should follow an exponential distribution. More specifically, letting 
to = 0, we expect that 


ti 1 
= f —g(n; Es, as, Bs) ds, L= bee ear 
fa DD 


i—l 


are iid as a standard exponential distribution. Because daily returns are discrete-time 
observations, we employ the time durations 


i, gi 
n= DL Sinan p) (7.42) 


t=t;-14+1 


and use the QQ plot to check the validity of the iid standard exponential distribution. 
If the model is adequate, the QQ plot should show a straight line through the origin 
with unit slope. 
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Distribution of Excesses 

Under the two-dimensional Poisson process model considered, the conditional dis- 
tribution of the excess x; = 7; — 7 over the threshold 7 is a GPD with shape 
parameter & and scale parameter y; = a; + (n — br). Therefore, we can make 
use of the relationship between a standard exponential distribution and GPD, and 
define 


Ain(1+6, 4) ifs, £0, 
w, = me (7.43) 


ry i _ 
Vi, if Er =0. 


If the model is adequate, {w,,} are independent and exponentially distributed with 
mean 1; see also Smith (1999). We can then apply the QQ plot to check the validity 
of the GPD assumption for excesses. 


Independence 

A simple way to check the independence assumption, after adjusting for the effects 
of explanatory variables, is to examine the sample autocorrelation functions of z; 
and w,. Under the independence assumption, we expect that both z; and w, have 
no serial correlations. 


7.7.8 An Illustration 


In this section, we apply a two-dimensional inhomogeneous Poisson process model 
to the daily log returns, in percentages, of IBM stock from July 3, 1962, to 
December 31, 1998. We focus on holding a long position of $10 million. The 
analysis enables us to compare the results with those obtained before by using 
other approaches to calculating VaR. 

We begin by pointing out that the two-dimensional homogeneous model of 
Example 7.7 needs further refinements because the fitted model fails to pass the 
model checking statistics of the previous section. Figures 7.8(a) and 7.8(b) show 
the autocorrelation functions of the statistics z; and w;,, defined in Eqs. (7.42) and 
(7.43), of the homogeneous model when the threshold is n = 2.5%. The horizontal 
lines in the plots denote asymptotic limits of two standard errors. It is seen that both 
Z, and w; series have some significant serial correlations. Figures 7.9(a) and 7.9(b) 
show the QQ plots of the z; and w, series. The straight line in each plot is the 
theoretical line, which passes through the origin and has a unit slope under the 
assumption of a standard exponential distribution. The QQ plot of z; shows some 
discrepancy. 

To refine the model, we use the mean-corrected log return series 
1 2190 
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re =r, f, r= 


374 


EXTREME VALUES, QUANTILES, AND VALUE AT RISK 


+ <+ 
© (>) 
a a 
2) oS 
Sg 1 Sg L l I ı i 
<+ + 
F F 
123 45 67 8 9 10 123 4 5 6 7 8 9 10 
Lag Lag 
(a) (c) 
+ <+ 
o (2) 
a a 
S S 
L o i t ol, | ıı |I 
Zo | QUS e] T 
<+ + 
: 3 
12345678910 123 45 6 7 8 9 10 
Lag Lag 


(b) (d) 


Figure 7.8 Sample autocorrelation functions of the z and w measures for two-dimensional Poisson 
models. Parts (a) and (b) are for homogeneous model and parts (c) and (d) are for inhomogeneous 
model. Data are daily mean-corrected log returns, in percentages, of IBM stock from July 3, 1962, to 
December 31, 1998, and the threshold is 2.5%. A long financial position is used. 


where r, is the daily log return in percentages, and employ the following explana- 
tory variables: 


1. x: an indicator variable for October, November, and December. That is, 


X 1, = 1 if t is in October, November, or December. This variable is chosen 
to take care of the fourth-quarter effect (or year-end effect), if any, on the 
daily IBM stock returns. 

xx: an indicator variable for the behavior of the previous trading day. Specif- 
ically, x2, = 1 if and only if the log return r?_, < —2.5%. Since we focus on 
holding a long position with threshold 2.5%, an exceedance occurs when the 
daily price drops over 2.5%. Therefore, x2; is used to capture the possibility 
of panic selling when the price of IBM stock dropped 2.5% or more on the 
previous trading day. 

x3: a qualitative measurement of volatility, which is the number of days 
between tf — 1 and ż —5 (inclusive) that has a log return with magnitude 
exceeding the threshold. In our case, x3; is the number of r? , satisfying 
Ir?_,| = 2.5% fori = 1,...,5. 

X4;: an annual trend defined as x4; = (year of time t — 1961)/38. This vari- 
able is used to detect any trend in the behavior of extreme returns of IBM 
stock. 
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5. x5;: a volatility series based on a Gaussian GARCH(1,1) model for the 


mean-corrected series r?. Specifically, x5; = or, where o? is the conditional 
variance of the GARCH(1,1) model 


r? = a, ar = Ofer, er~ N(O, 1), 


a 
ll 


? = 0.04565 + 0.0807a?_, + 0.90310, ,. 


These five explanatory variables are all available at time t — 1. We use two volatil- 
ity measures (x3; and x5,) to study the effect of market volatility on VaR. As 
shown in Example 7.3 by the fitted AR(2)-GARCH(1,1) model, the serial corre- 
lations in r, are weak so that we do not entertain any ARMA model for the mean 
equation. 

Using the prior five explanatory variables and deleting insignificant parameters, 
we obtain the estimation results shown in Table 7.4. Figures 7.8(c) and 7.8(d) 
and Figures 7.9(c) and 7.9(d) show the model checking statistics for the fitted 
two-dimensional inhomogeneous Poisson process model when the threshold is 
n = 2.5%. All autocorrelation functions of z, and w, are within the asymptotic 
two standard error limits. The QQ plots also show marked improvements as they 
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Figure 7.9 Quantile-to-quantile plot of z and w measures for two-dimensional Poisson models. Parts 
(a) and (b) are for homogeneous model and parts (c) and (d) are for inhomogeneous model. Data are 
daily mean-corrected log returns, in percentages, of IBM stock from July 3, 1962, to December 31, 
1998, and the threshold is 2.5%. A long financial position is used. 
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TABLE 7.4 Estimation Results of Two-Dimensional Inhomogeneous Poisson Process 
Model for Daily Log Returns, in Percentages, of IBM Stock from July 3, 1962 to 
December 31, 19987 


Parameter Constant Coefficient of x3, Coefficient of x4; Coefficient of x5; 


Threshold 2.5% with 334 Exceedances 


bi 0.3202 1.4772 2.1991 

(Std.err) (0.3387) (0.3222) (0.2450) 

In(a;) —0.8119 0.3305 1.0324 

(Std.err) (0.1798) (0.0826) (0.2619) 

E 0.1805 0.2118 0.3551 —0.2602 

(Std.err) (0.1290) (0.0580) (0.1503) (0.0461) 
Threshold 3.0% with 184 Exceedances 

B; 1.1569 2.1918 

(Std.err) (0.4082) (0.2909) 

In(a,) —0.0316 0.3336 

(Std.err) (0.1201) (0.0861) 

E 0.6008 0.2480 —0.3175 

(Std.err) (0.1454) (0.0731) (0.0685) 


“Four explanatory variables defined in the text are used. The model is for holding a long position on 
IBM stock. The sample mean of the log returns is removed from the data. 


indicate no model inadequacy. Based on these checking results, the inhomogeneous 
model seems adequate. 
Consider the case of threshold 2.5%. The estimation results show the following: 


1. All three parameters of the intensity function depend significantly on the 
annual time trend. In particular, the shape parameter has a negative annual 
trend, indicating that the log returns of IBM stock are moving farther away 
from normality as time passes. Both the location and scale parameters 
increase over time. 

2. Indicators for the fourth quarter, x;,, and for panic selling, x2;, are not sig- 
nificant for all three parameters. 

3. The location and shape parameters are positively affected by the volatility of 
the GARCH(1,1) model; see the coefficients of x5,. This is understandable 
because the variability of log returns increases when the volatility is high. 
Consequently, the dependence of log returns on the tail index is reduced. 

4. The scale and shape parameters depend significantly on the qualitative mea- 
sure of volatility. Signs of the estimates are also plausible. 


The explanatory variables for December 31, 1998, assumed the values x3 9199 = 
0, x4,9199 = 0.9737, and x5,.9199 = 1.9766. Using these values and the fitted model 
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in Table 7.4, we obtain 
E9190 = 0.01195, In(a@9190) = 0.19331, Bo190 = 6.105. 


Assume that the tail probability is 0.05. The VaR quantile shown in Eq. (7.36) gives 
VaR = 3.03756%. Consequently, for a long position of $10 million, we have 


VaR = $10,000,000 x 0.0303756 = $303,756. 


If the tail probability is 0.01, the VaR is $497, 425. The 5% VaR is slightly larger 
than that of Example 7.3, which uses a Gaussian AR(2)—GARCH(1,1) model. The 
1% VaR is larger than that of Case 1 of Example 7.3. Again, as expected, the 
effect of extreme values (i.e., heavy tails) on VaR is more pronounced when the 
tail probability used is small. 

An advantage of using explanatory variables is that the parameters are adaptive 
to the change in market conditions. For example, the explanatory variables for 
December 30, 1998, assumed the values x3 9189 = 1, x4,9189 = 0.9737, and x5,9189 = 
1.8757. In this case, we have 


§o1g9 = 0.2500, In(@o1g9) = 0.52385, Boig9 = 5.8834. 


The 95% quantile (i.e., the tail probability is 5%) then becomes 2.69139%. Con- 
sequently, the VaR is 


VaR = $10,000,000 x 0.0269139 = $269, 139. 


If the tail probability is 0.01, then VaR becomes $448, 323. Based on this example, 
the homogeneous Poisson model shown in Example 7.8 seems to underestimate 
the VaR. 


7.8 THE EXTREMAL INDEX 


So far our discussions of extreme values are based on the assumption that the 
data are iid random variables. However, in reality extremal events tend to occur in 
clusters because of the serial dependence in the data. For instance, we often observe 
large returns (both positive and negative) of an asset after some news event. In this 
section we extend the theory and applications of extreme values to cases in which 
the data form a strictly stationary time series. The basic concept of the extension 
is extremal index, which allows one to characterize the relationship between the 
dependence structure of the data and their extremal behavior. Our discussion will 
be brief. Interested readers are referred to Beirlant et al. (2004, Chapter 10) and 
Embrechts et al. (1997). 

Let x1, x2, ... bea strictly stationary sequence of random variables with marginal 
distribution function F(x). Consider the case of n observations {x;|i = 1,..., n}. 
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As before, let xm) be the maximum of the data, that is, xm) = max{x;}. We seek 
the limiting distribution of (X(n) — Bn)/a@, for some suitably chosen normalizing 
constants a, >0 and £n. If {x;} were iid, Section 7.5 shows that the only possi- 
ble nondegenerate limits are the extreme value distributions. What is the limiting 
distribution when {x;} are serially dependent? 

To answer this question, we start with a heuristic argument. Suppose that the 
serial dependence of the stationary series x; decays quickly so that x; and x;+¢ are 
essentially independent when £ is sufficiently large. In other words, assume that the 
long-range dependence of x; vanishes quickly. Now divide the data into disjoint 
blocks of size k. Specifically, let g = [n/k] be the largest integer less than or equal 
to n/k. The ith block of the data is then {x;|j = (i — 1) *k+1,...,i*k}, where 
it is understood that the (g + 1)th block may contain less than k observations. 
Let x;,; be the maximum of the ith block, that is, xg; = max{x;|j = (i — 1) *k+ 
1,...,7*k}. The collection of block maxima is {xg ;li = 1,..., g + 1}. From the 
definitions, it is easy to see that 


X(n) = E Xk,i- (7.44) 


That is, the sample maximum is also the maximum of the block maxima. If the 
block size k is sufficiently large and the block maximum x;,; does not occur near 
the end of the ith block, then xg, and xz, i+ı are sufficiently far apart and essen- 
tially independent under the assumption of weak long-range dependence in {x;}. 
Consequently, {xp ili = 1,..., g + 1} can be regarded as a sample of iid random 
variables, and the limiting distribution of its maximum, which is Xn), should be 
the extreme value distribution. The prior discussion shows that, under some proper 
condition, the limiting distribution of the maximum of a strictly stationary time 
series is also the extreme value distribution. 

The proper condition needed for the maximum x(n) of a strictly stationary time 
series to have the extreme value limiting distribution is obtained by Leadbetter 
(1974) and known as the D(u,,) condition. Details are given in the next section. 
The prior heuristic argument also suggests that, even though the limiting distribu- 
tion of Xn) is also the extreme value distribution, the parameters associated with 
the limiting distribution, however, will not be the same as those when {x;} are iid 
random samples because the limiting distribution depends on the marginal distri- 
bution of the underlying sequences. For the iid sequences, the marginal distribution 
is F(x), but for a stationary series the underlying sequences are the block max- 
ima xg į whose marginal distribution is not F(x). The marginal distribution of x, 
depends on k and the strength of serial dependence in {x;}. 


7.8.1 The D(u,) Condition 


Consider the sample x1, x2,..., Xn. To place limits on the long-range dependence 
of {x;}, let u, be a sequence of thresholds increasing at a rate for which the expected 
number of exceedances of x; over u, remains bounded. Mathematically, this says 
that lim supn[1 — F(u,)] < oo, where F(-) is the marginal cumulative distribution 
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function of x;. For any positive integers p and q, suppose that i, (v = 1,..., p) 
and j; (t = 1,...,q) are arbitrary integers satisfying 
lgi gsi <+++ <ip <j <-++ < jg <0, 


where jı — ip = £n, where £, is a function of the sample size n such that ¢,/n > 0 
as n — oo. Let Ay = {i1, i2,..., ip} and Az = {j1, jo,.-., jg} be two sets of time 
indices. From the prior condition, elements in A; and A, are separated by at least 


£, time periods. The condition D(u,,) is satisfied if 


|P( max xj < Un) a P (max x; < un) P (max x; = Un)| < < Ôn Ani (7.45) 
icAUA? EA] EÁ? 


where ôn e, — 0 as n —> oo. This condition says that any two events of the form 
{maxje4, Xi < Un} and {maxje,4, Xi < Un} can become asymptotically independent 
as the sample size n increases when the index subsets A; and A% of {1,2,..., n} 
are separated by a distance £,, which satisfies £,/n —> 0 as n > oo. The D(u,) 
condition looks complicated, but it is relatively weak. For instance, consider Gaus- 
sian sequences with autocorrelation p, for lag n. The D(u,) condition is satisfied 
if on In(n) > 0 as n — oo; see Berman (1964). 


Leadbetter’s Theorem 1. Suppose that {x;|i = 1,..., n} is a strictly station- 
ary time series for which there exist sequences of constants a, >0 and f, and a 
nondegenerate distribution function F,.(-) such that 


P | Pn < <=] >a F(x), n> œ, 


Qn 


where —> 4 denotes convergence in distribution. If D (un) holds with u, = a,x + By 
for each x such that F(x) > 0, then F(x) is an extreme value distribution function. 

The prior theorem shows that the possible limiting distributions for the maxima 
of strictly stationary time series satisfying the D(u,,) condition are also the extreme 
value distributions. As noted before, the dependence can affect the limiting distri- 
bution, however. The effect of the dependence appears in the marginal distribution 
of the block maxima xp i. To state the effect more precisely, let {x),X2,...,Xn} 
be a sequence of iid random variables such that the marginal distribution of x; is 
the same as that of the stationary time series x;. Let Xn) be the maximum of {*;}. 
Leadbetter (1983) establishes the following result. 


Leadbetter’s Theorem 2. If there exist sequences of constants a, > 0 and B, 
and a nondegenerate distribution function F,(x) such that 


P —— Bi a <>] >q F,(x), n> œ, 
An 
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if the condition D(u,,) holds with u, = a,x + n for each x such that F, (x) > 0, 
and if P[(x~1) — Bn)/@n < x] converges for some x, then 


PPE sou BORO noo, 
Qn 


for some constant 0 € (0, 1]. 


The constant @ is called the extremal index. It plays an important role in deter- 
mining the limiting distribution F(x) for the maximum of a strictly stationary time 
series. To see this, we provide some simple derivations for the case of £ # 0. From 
the result of Eq. (7.16), F(x) is the generalized extreme value distribution and 


assumes the form 
7 x= Pp -1/é 
F(x) = exp | — Lee ; 


where € 4 0 and 1 + (x — B)/a>O0. In other words, we assume that for the iid 
sequence {x;}, the limiting extreme distribution of x(,) has parameters £, 6 and a. 
Based on Theorem 2 of Leadbetter (1983), we have 
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x— Be —1/Ex 
=exp| — (: + &, z ) ; (7.46) 


where £, = £, a, = a65, and B, = B — a (1 — 6°)/é. Therefore, for a stationary 
time series {x;} satisfying the D(un) condition, the limiting distribution of the 
sample maximum is the generalized extreme value distribution with the shape 
parameter £, which is the same as that of the iid sequences. On the other hand, the 
location and scale parameters are affected by the extremal index 0. Specifically, 
a, = a0* and B, = B — a (1 — 6°)/é. Results for the case of £ = 0 can be derived 
via the same approach and we have a, = œ and f, = 6 + aIn(@). 
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A formal definition of the extremal index is as follows: Let {x;} be a strictly 
stationary time series with marginal cumulative distribution function F(x) and 0 
a nonnegative number. Assume that for every t >Q there exists a sequence of 
thresholds u, such that 


lim a[l — F(u,)] = T, (7.47) 
n—->Co 

lim P(x) < un) = exp(—Or). (7.48) 
noo 


Then @ is called the extremal index of the time series {x;}. See Embrechts et al. 
(1997). Note that, for the corresponding iid sequence {x;}, under the assumption 
that Eq. (7.47) holds, we have 


1 n 
lim P(n < Un) = lim [F(u,)]" = lim fı — —n[1 — Fan} — exp(—t), 
n—> œo n—>0 n—> oo n 


where we have used the property lim„p—>oo(1 — y/n)” = exp(—y). Thus, the defi- 
nition also highlights the role played by the extremal index 0. 


7.8.2 Estimation of the Extremal Index 


There are several ways to estimate the extremal index 0 of a strictly stationary 
time series {x;}. Each estimation method is associated with an interpretation of the 
extremal index. In what follows, we discuss some of the estimation methods. 


The Blocks Method 
From the definition of the extremal index 0, we have, for a large n, that 


P (x(n) < Un) x PGs < Un) = [F(un)]"", 
provided that n[1 — F(u,)] —> t > 0. Hence 


In P(X(n) <un) _ 


im ———— = (7.49) 
n>% nin F(uy) 


This limiting relationship suggests a method to estimate 6. The denominator can 
be estimated by the sample quantile, namely 


N N Uia) 
Ê(un) = = Iæ < uy) =1-— I(x; >un) = 1 —- 
(Un) PE su) Dig Un) - 
where /(C) = 1 if the augment C holds and = 0 otherwise, that is, Z (C) is the indi- 
cator variable for the statement C, and N (un) denotes the number of exceedances 
of the sample over the threshold un. The numerator P (x(n) < un) is harder to esti- 
mate. One possibility is to use the block maxima. Specifically, let k = k(n) be a 
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properly chosen block size that depends on the sample size n and, as before, let 
g = [n/k] be the integer part of n/k. For simplicity, assume that n = gk. The ith 
block consists of {x;|j =(@—1)*k+1,...,i*k} and let xg; be the maximum 
of the ith block. Using Eq. (7.44) and the approximate independence of block 
maxima, we have 


P (Xm) < < Un) = = P mi Xk i S Un) © ee [P (xxi < un) I>. 


The probability P(x,,; < un) can be estimated from the block maxima, that is, 


PGi, < un) = 1S i < un) = 1- DS i, ee ee ee 
E & 


i=l i=1 8 


where G(u,,) is the number of blocks such that the block maximum exceeds the 
threshold u,,. Combining the estimators for numerator and denominator, we obtain 


gm _ 8 PU ~ C(,)/g) _ 1 nfl ~ G@,)/8] 


b “nin{l—N(in)/n] k In{l — N(uy)/n]’ (a0) 


where the subscript b signifies the blocks method. Note that N (un) is the number 
of exceedances of the sample {x;} over the threshold u,, and G(u,,) is the number 
of blocks with one or more exceedances. Using approximation based on Taylor 
expansion of In(1 — x), we obtain a second estimator: 


§2 1 GUn)/g _ Glun) 
> kK N(tn)/n Nun) 


Based on the results of Hsing et al. (1988), this estimator can also be interpreted as 
the reciprocal of the mean cluster size of the limiting compound Poisson process 
N (un). 


The Runs Method 
O’Brien (1987) proved, under certain weak mixing condition, that 


lim POG) < un|x1 > Un) = 9, 
n> 


where Xin) = = max2<j<s Xi, where s is a function of the sample size n satisfying 
some growth conditions, including s + œ and s/n — 0 as n —> oo. See Beirlant 
et al. (2004) and Embrechts et al. (1997) for details. This result has been used to 
construct an estimator of 0 based on runs: 


n—k n—k 
§@ = int Ain) int (Ain) 


rae un) NG) 


, 
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where N (un) is the number of exceedances of the sample {x;} over the threshold 
Un, k is a function of n, and A; n = {Xj > Un, Xi41 < Un, ---, Xi4k < Un}. Note that 
Ai n denotes the event that an exceedance is followed by a run of k observations 
below the threshold. Since k/n — 0 as n — oo, we can write the runs estimator as 


í n-!N (un) i 


Finally, other estimators of 0 are available in the literature. See, for instance, the 
methods discussed in Beirlant et al. (2004). For demonstration, we consider, again, 
the negative daily log returns of IBM stock from July 3, 1962, to December 31, 
1998. Figure 7.10 shows the estimates of the extremal index for various thresholds 
when the block size k = 10. We chose k = 10 because the daily log returns have 
weak serial dependence. The estimates are based on the blocks method, that is, 
ô. From the plot, we see that GO = 0.82 for threshold 0.025. Indeed, a simple 
direct calculation using k = 10 and threshold 0.025 gives A = 0.823. The plot 
also shows that the estimate a of the extremal index might be sensitive to the 
choices of threshold and block size k. 


Threshold 
0.08520 0.02960 0.02280 0.01840 0.01460 0.01200 0.00716 


11 
| 


1.0 


theta (919 blocks of size 10) 


5 58 129 209 289 369 449 529 609 689 769 850 
K 
Figure 7.10 Estimates of extremal index for negative daily log returns of IBM stock from July 3, 


1962, to December 31, 1998. Block size is k = 10 and lower horizontal axis of plot K denotes number 
of blocks whose maximum exceeds threshold. 
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7.8.3 Value at Risk for a Stationary Time Series 


The relationship between F(x) of the maximum of a stationary time series and 
F, (x) of its iid counterpart established in Theorem 2 of Leadbetter (1983) can be 
used to calculate the VaR of a financial position when the associated log returns 
form a stationary time series. Specifically, from P(x(n) < un) ~ [F (x)]"", the (1 — 
p)th quantile of F(x) is the (1 — p)”°th quantile of the limiting extreme value 
distribution of xn). Consequently, the VaR of Eq. (7.28) based on the extreme 
value theory becomes 


pr - & {1 —[=n0 In(l — py } if &n #0 
Bn — On In[—n@ In(1 — p)] if &n = 0, 


(7.51) 


VaR = | 


where n is the length of the subperiod. From the formula, we risk underestimating 
the VaR if the extremal index is overlooked. 

As an illustration, again consider the negative daily log returns of IBM stock 
from July 3, 1962, to December 31, 1998. Using A = 0.823, the 1% VaR for the 
long position of $10 millions on the stock for the next trading day becomes 3.2714 
for the case of choosing n = 63 days in parameter estimation. As expected, this is 
higher than the 3.0497 of Example 7.6 when the extremal index is neglected. 


R Demonstration 


Vv 


library (evir) 

help (exindex) 

> ml=exindex(nibm,10) %Estimate the extremal index 
of Figure 7.10. 

> % VaR calculation. 

> 2.583-(.945/.335) * (1-(-63*.823*log(.99))*-.335) 

[1] 3.271388 


v 


EXERCISES 


7.1. Consider the daily returns of GE stock from January 2, 1998, to December 31, 
2008. The data can be obtained from CRSP or the file d-ge9808.txt. Convert 
the simple returns into log returns. Suppose that you hold a long position on 
the stock valued at $1 million. Use the tail probability 0.01. Compute the 
value at risk of your position for 1-day horizon and 15-day horizon using the 
following methods: 

(a) The RiskMetrics method. 

(b) A Gaussian ARMA-—GARCH model. 

(c) An ARMA-—GARCH model with a Student-r distribution. You should also 
estimate the degrees of freedom. 

(d) The traditional extreme value theory with subperiod length n = 21. 
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7.2. 


LS: 


7A, 


The file d~csco9808.txt contains the daily simple returns of Cisco Systems 

stock from 1998 to 2008 with 2767 observations. Transform the simple returns 

to log returns. Suppose that you hold a long position of Cisco stock valued 

at $1 million. Compute the value at risk of your position for the next trading 

day using probability p = 0.01. 

(a) Use the RiskMetrics method. 

(b) Use a GARCH model with a conditional Gaussian distribution. 

(c) Use a GARCH model with a Student-t distribution. You may also estimate 
the degrees of freedom. 

(d) Use the unconditional sample quantile. 


(e 


wm 


Use a two-dimensional homogeneous Poisson process with threshold 2%, 
that is, focusing on the exceeding times and exceedances that the daily 
stock price drops 2% or more. Check the fitted model. 

(f) Use a two-dimensional nonhomogeneous Poisson process with threshold 
2%. The explanatory variables are (1) an annual time trend, (2) a dummy 
variable for October, November, and December, and (3) a fitted volatility 
based on a Gaussian GARCH(1,1) model. Perform a diagnostic check on 
the fitted model. 

Repeat the prior two-dimensional nonhomogeneous Poisson process with 
threshold 2.5 or 3%. Comment on the selection of threshold. 

Use Hill’s estimator and the data d-csco9808.txt to estimate the tail index 
for daily log returns of Cisco stock. 


wa 


(g 


The file d-~hpg3dx9808.txt contains dates and the daily simple returns of 
Hewlett-Packard, the CRSP value-weighted index, equal-weighted index, and 
the S&P 500 index from 1998 to 2008. The returns include dividend dis- 
tributions. Transform the simple returns to log returns. Assume that the tail 
probability of interest is 0.01. Calculate value at risk for the following financial 
positions for the first trading day of year 2009. 


(a) Long on Hewlett-Packard stock of $1 million and S&P 500 index of $1 
million using RiskMetrics. The a coefficient of the IGARCH(1,1) model 
for each series should be estimated. 

(b) The same position as part (a) but using a univariate ARMA-—GARCH 
model for each return series. 

(c) A long position on Hewlett-Packard stock of $1 million using a two- 
dimensional nonhomogeneous Poisson model with the following explana- 
tory variables: (1) an annual time trend, (2) a fitted volatility based on a 
Gaussian GARCH model for Hewlett-Packard stock, (3) a fitted volatility 
based on a Gaussian GARCH model for the S&P 500 index returns, and 
(4) a fitted volatility based on a Gaussian GARCH model for the value- 
weighted index return. Perform a diagnostic check for the fitted models. 
Are the market volatility as measured by the S&P 500 index and value- 
weighted index returns helpful in determining the tail behavior of stock 
returns of Hewlett-Packard? You may choose several thresholds. 
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7.6. 


7.1. 


7.8. 
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Consider the daily returns of Alcoa (AA) stock and the S&P 500 composite 

index (SPX) from 1998 to 2008. The simple returns and dates are in the file 

d-aaspx9808.txt. Transform the simple returns to log returns and focus on 
the daily negative log returns of AA stock. 

(a) Fit the generalized extreme value distribution to the negative AA log 
returns, in percentages, with subperiods of 21 trading days. Write down 
the parameter estimates and their standard errors. Obtain a scatterplot and 
a QQ plot of the residuals. 

(b) What is the return level of the prior fitted model when 24 subperiods of 
21 days are used? 

(c) Obtain a QQ plot (against exponential distribution) of the negative log 
returns with threshold 2.5% and a mean excess plot of the returns. 

(d) Fit a generalize Pareto distribution to the negative log returns with thresh- 
old 3.5%. Write down the parameter estimates and their standard errors. 

(e) Obtain (i) a plot of excess distribution, (ii) a plot of the tail of the under- 
lying distribution, (iii) a scatterplot of residuals, and (iv) a QQ plot of the 
residuals for the fitted GPD. 

(f) Based on the fitted GPD model, compute the VaR and expected shortfall 
for probabilities q = 0.99 and 0.999. 

Consider, again, the daily log returns of Alcoa (AA) stock in Exercise 7.5. 

Focus now on the daily positive log returns. Answer the same questions as in 

Exercise 7.5. However, use threshold 3% in fitting the GPD model. 

Consider the daily returns of SPX in d-aaspx9808.txt. Transform the 

returns into log returns and focus on the daily negative log returns. 

(a) Fit the generalized extreme value distribution to the negative SPX log 
returns, in percentage, with subperiods of 21 trading days. Write down the 
parameter estimates and their standard errors. Obtain a scatterplot and a 
QQ plot of the residuals. 

(b) What is the return level of the prior fitted model when 24 subperiods of 
21 days are used? 

(c) Obtain a QQ plot (against exponential distribution) of the negative log 
returns with threshold 2.5% and a mean excess plot of the returns. 

(d) Fit a generalize Pareto distribution to the negative log returns with thresh- 
old 2.5%. Write down the parameter estimates and their standard errors. 

(e) Obtain (i) a plot of excess distribution, (ii) a plot of the tail of the under- 
lying distribution, (iii) a scatterplot of residuals, and (iv) a QQ plot of the 
residuals for the fitted GPD. 

(f) Based on the fitted GPD model, compute the VaR and expected shortfall 
for probabilities q = 0.99 and 0.999. 

Consider the daily log returns of the GE stock of Exercise 7.1. Obtain esti- 

mates Ne and 6) of the extremal index of (a) the positive return series 

and (b) the negative return series, using block sizes k = 5 and 10 and 

threshold 2.5%. 
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CHAPTER 8 


Multivariate Time Series Analysis 
and Its Applications 


Economic globalization and Internet communication have accelerated the integra- 
tion of world financial markets in recent years. Price movements in one market can 
spread easily and instantly to another market. For this reason, financial markets 
are more dependent on each other than ever before, and one must consider them 
jointly to better understand the dynamic structure of the global finance. One market 
may lead the other market under some circumstances, yet the relationship may be 
reversed under other circumstances. Consequently, knowing how the markets are 
interrelated is of great importance in finance. Similarly, for an investor or a finan- 
cial institution holding multiple assets, the dynamic relationships between returns 
of the assets play an important role in decision making. In this and the next two 
chapters, we introduce econometric models and methods useful for studying jointly 
multiple return series. In the statistical literature, these models and methods belong 
to vector or multivariate time series analysis. 

A multivariate time series consists of multiple single series referred to as com- 
ponents. As such, concepts of vector and matrix are useful in understanding 
multivariate time series analysis. We use boldface notation to indicate vectors 
and matrices. If necessary, readers may consult Appendix A of this chapter for 
some basic operations and properties of vectors and matrices. Appendix B pro- 
vides some results of multivariate normal distribution, which is widely used in 
multivariate statistical analysis (e.g., Johnson and Wichern, 1998). 

Let r; = (Fit, rot, ---, rkt) be the log returns of k assets at time t, where a’ 
denotes the transpose of a. For example, an investor holding stocks of IBM, 
Microsoft, Exxon Mobil, General Motors, and Wal-Mart may consider the five- 
dimensional daily log returns of these companies. Here rı, denotes the daily log 
return of IBM stock, rz; is that of Microsoft, and so on. As a second example, 
an investor who is interested in global investment may consider the return series 
of the S&P 500 index of the United States, the FTSE 100 index of the United 
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Kingdom, and the Nikkei 225 index of Japan. Here the series is three-dimensional, 
with rı; denoting the return of the S&P 500 index, rx the return of the Financial 
Times Stock Exchange (FTSE) 100 index, and r3; the return of the Nikkei 225. 
The goals of this chapter are (a) to explore the basic properties of r, and (b) to 
study econometric models for analyzing the multivariate data {r;|t = 1,..., T}. 

Many of the models and methods discussed in previous chapters can be gen- 
eralized directly to the multivariate case. But there are situations in which the 
generalization requires some attention. In some situations, one needs new models 
and methods to handle the complicated relationships between multiple series. In 
this chapter, we discuss these issues with emphasis on intuition and applications. 
For statistical theory of multivariate time series analysis, readers are referred to 
Liitkepohl (2005) and Reinsel (1993). 


8.1 WEAK STATIONARITY AND CROSS-CORRELATION MATRICES 


Consider a k-dimensional time series r; = (ry;,..., kr)’. The series r; is weakly 
stationary if its first and second moments are time invariant. In particular, the 
mean vector and covariance matrix of a weakly stationary series are constant over 
time. Unless stated explicitly to the contrary, we assume that the return series of 
financial assets are weakly stationary. 

For a weakly stationary time series r;, we define its mean vector and covariance 
matrix as 


H=E(r;), To= El, — (rr — p)'], (8.1) 


where the expectation is taken element by element over the joint distribution of r;. 
The mean p is a k-dimensional vector consisting of the unconditional expectations 
of the components of r;. The covariance matrix Tọ is a k x k matrix. The ith 
diagonal element of Po is the variance of rj, whereas the (i, j)th element of To is 
the covariance between r;; and rjr. We write p = ({11,..., px)’ and Fo = V3; (0)] 
when the elements are needed. 


8.1.1 Cross-Correlation Matrices 


Let D be ak x k diagonal matrix consisting of the standard deviations of r;, for 


i=1,...,k. In other words, D = diag{./T;(0),..., VI (0)}. The concurrent, 
or lag-zero, cross-correlation matrix of r; is defined as 


Po = [pij ©] = D'ToD". 
More specifically, the (i, j)th element of pọ is 


T; (0) COV (Fit, F je) 


ij 0 EF > ’ 
PHO = OO) stdlrn)std(r,) 
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which is the correlation coefficient between r;,; and rj;. In time series analysis, 
such a correlation coefficient is referred to as a concurrent, or contemporaneous, 
correlation coefficient because it is the correlation of the two series at time t. It is 
easy to see that p;; (0) = p;;(0), —1 < pi; (0) < 1, and p;;(0) = 1 for 1 <i, j < k. 
Thus, p(0) is a symmetric matrix with unit diagonal elements. 

An important topic in multivariate time series analysis is the lead—lag relation- 
ships between component series. To this end, the cross-correlation matrices are 
used to measure the strength of linear dependence between time series. The lag-¢ 
cross-covariance matrix of r, is defined as 


Py = [F0] = El@: — “(rie e], (8.2) 


where m is the mean vector of r;. Therefore, the (i, j)th element of I’; is the covari- 
ance between r;; and rj r-e. For a weakly stationary series, the cross-covariance 
matrix Tọ is a function of £, not the time index t. 

The lag-@ cross-correlation matrix (CCM) of r, is defined as 


pe = (pj O] = DTD, (8.3) 


where, as before, D is the diagonal matrix of standard deviations of the individual 
series r;;. From the definition, 


is Pj) _ Cov(rit, Fj te) 
Pij Œ) = Ti) jj) — std(rir)std(r jr)’ 


which is the correlation coefficient between r;; and r;;—¢. When £ > 0, this corre- 
lation coefficient measures the linear dependence of ris on rj,;~¢, which occurred 
prior to time t. Consequently, if ;;(€) # 0 and €> 0, we say that the series rj; 
leads the series r;; at lag £. Similarly, p;;(¢) measures the linear dependence of rj; 
and r;,;—¢, and we say that the series r;; leads the series r;; at lag £ if p;;(€) #0 
and £ > 0. Equation (8.4) also shows that the diagonal element ;;(€) is simply the 
lag-€ autocorrelation coefficient of rj;. 

Based on this discussion, we obtain some important properties of the cross 
correlations when £ > 0. First, in general, p;;(€) 4 pji(£) for i 4 j because the 
two correlation coefficients measure different linear relationships between {r;;} and 
{rjr}. Therefore, Fe and p; are in general not symmetric. Second, using Cov(x, y) 
= Cov(y, x) and the weak stationarity assumption, we have 


(8.4) 


Cov(rit, F j,t—e) = Cov(rj,1~-2, it) = Cov(r jt, Fi t4) = Cov(jr, Tito), 


so that T;; (£) = T j (—£). Because T ;;(—£) is the (j, i)th element of the matrix T —¢ 
and the equality holds for 1 < i, j < k, we have Ty = T’; and p; = p. Conse- 
quently, unlike the univariate case, pp # p_, for a general vector time series when 
€>0. Because pp = p' p, it suffices in practice to consider the cross-correlation 
matrices p; for £ > 0. 
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8.1.2 Linear Dependence 


Considered jointly, the cross-correlation matrices {p,|€ = 0, 1,...} of a weakly 
stationary vector time series contain the following information: 


1. The diagonal elements {;;(€)|€ = 0, 1, ...} are the autocorrelation function 
of r ite 

2. The off-diagonal element p;; (0) measures the concurrent linear relationship 
between rj; and rjr. 

3. For £ > 0, the off-diagonal element p;;(£) measures the linear dependence of 
ri, on the past value r; s-e. 


Therefore, if p;;(¢) = 0 for all ¢>0, then r;; does not depend linearly on any 
past value r;,;—¢ of the rj; series. 

In general, the linear relationship between two time series {r;;} and {rjr} can be 
summarized as follows: 


1. rj, and rj; have no linear relationship if p;i; (£) = pji (£) = 0 for all £ > 0. 

2. rip and rj; are concurrently correlated if p;; (0) Æ 0. 

3. rir and rj; have no lead-lag relationship if p;i; (£) = 0 and ;;(€) = 0 for all 
£>0. In this case, we say the two series are uncoupled. 

4. There is a unidirectional relationship from rj; to rj; if pi; (© = 0 for all 
£>0, but p;;(v) 40 for some v > 0. In this case, r; does not depend on 
any past value of r;;, but r;,; depends on some past values of rj;. 

5. There is a feedback relationship between rj; and rj; if pij (£) A 0 for some 
€>0 and p;;(v) Æ 0 for some v > 0. 


The conditions stated earlier are sufficient conditions. A more informative approach 
to study the relationship between time series is to build a multivariate model for 
the series because a properly specified model considers simultaneously the serial 
and cross correlations among the series. 


8.1.3 Sample Cross-Correlation Matrices 


Given the data {r;|t = 1,..., T}, the cross-covariance matrix F; can be esti- 
mated by 
ee! 
has 20 = F)(r eF), €>0, (8.5) 


where r = OD r,)/T is the vector of sample means. The cross-correlation matrix 
Pe is estimated by 


TD, l> 0, (8.6) 
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where D is the k x k diagonal matrix of the sample standard deviations of the 
component series. 

Similar to the univariate case, asymptotic properties of the sample cross- 
correlation matrix p; have been investigated under various assumptions; see, for 
instance, Fuller (1976, Chapter 6). The estimate is consistent but is biased in a 
finite sample. For asset return series, the finite sample distribution of Pp is rather 
complicated partly because of the presence of conditional heteroscedasticity and 
high kurtosis. If the finite-sample distribution of cross correlations is needed, 
we recommend that proper bootstrap resampling methods be used to obtain 
an approximate estimate of the distribution. For many applications, a crude 
approximation of the variance of ĝ;; (£) is sufficient. 


Example 8.1. Consider the monthly log returns of IBM stock and the S&P 500 
index from January 1926 to December 2008 with 996 observations. The returns 
include dividend payments and are in percentages. Denote the returns of IBM 
stock and the S&P 500 index by rı; and rz, respectively. These two returns form a 
bivariate time series r; = (r1;, r2)’. Figure 8.1 shows the time plots of r,. Figure 8.2 
shows some scatterplots of the two series. The plots show that the two return series 
are concurrently correlated. Indeed, the sample concurrent correlation coefficient 
between the two returns is 0.65, which is statistically significant at the 5% level. 
However, the cross correlations at lag 1 are weak if any. 
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Figure 8.1 Time plots of monthly log returns, in percentages, for (a) IBM stock and (b) the S&P 500 
index from January 1926 to December 2008. 
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Figure 8.2 Some scatterplots for monthly log returns of IBM stock and S&P 500 index: (a) concurrent 
plot of IBM vs. S&P 500, (b) S&P 500 vs. lag-1 IBM, (c) IBM vs. lag-1 S&P 500, and (d) S&P 500 
vs. lag-1 S&P 500. 


Table 8.1 provides some summary statistics and cross-correlation matrices of the 
two series. For a bivariate series, each CCM is a 2 x 2 matrix with four correlations. 
Empirical experience indicates that it is rather hard to absorb simultaneously many 
cross-correlation matrices, especially when the dimension k is greater than 3. To 
overcome this difficulty, we use the simplifying notation of Tiao and Box (1981) 
and define a simplified cross-correlation matrix consisting of three symbols “+,” 
“—” and “.” where they have the following meaning: 


1. Plus sign (+) means that the corresponding correlation coefficient is greater 
than or equal to 2//T. 


2. Minus sign (—) means that the corresponding correlation coefficient is less 
than or equal to —2//T. 


3. Period (.) means that the corresponding correlation coefficient is between 


—2/J/T and 2/VT. 


And 1/./T is the asymptotic 5% critical value of the sample correlation under the 
assumption that r, is a white noise series. 


WEAK STATIONARITY AND CROSS-CORRELATION MATRICES 395 


TABLE 8.1 Summary Statistics and Cross-Correlation Matrices of Monthly Log 
Returns of IBM Stock and S&P 500 Index: January 1926 to December 2008 


(a) Summary Statistics 


Standard Excess 
Ticker Mean Error Skewness Kurtosis Minimum Maximum 
IBM 1.089 7.033 —0.068 2.622 —30.37 38.57 


SP5 0.430 5.537 —0.521 7.927 —35.59 35.22 


(b) Cross-Correlation Matrices 


Lag 1 Lag 2 Lag 3 Lag 4 Lag 5 


0.04 0.10 0.00 0.08 0.01 0.06 0.03 0.03 0.02 0.08 
0.04 0.08 0.02 0.02 0.06 0.10 0.04 0.03 0.00 0.09 


(c) Simplified notation 


Table 8.1(c) shows the simplified CCM for the monthly log returns of IBM 
stock and the S&P 500 index. It is easily seen that significant cross correlations at 
the approximate 5% level appear mainly at lags 1 and 3. An examination of the 
sample CCMs at these two lags indicates that (a) S&P 500 index returns have some 
marginal autocorrelations at lags 1, 2, 3, and 5 and (b) IBM stock returns depend 
weakly on the previous returns of the S&P 500 index. The latter observation is 
based on the significance of cross correlations at the (1, 2)th element of lag-1, lag-2 
and lag-5 CCMs. 

Figure 8.3 shows the sample autocorrelations and cross correlations of the two 
series. The upper-left plot is the sample ACF of IBM stock returns and the upper- 
right plot shows the dependence of IBM stock returns on the lagged S&P 500 index 
returns. The dashed lines in the plots are the asymptotic two standard error limits 
of the sample auto- and cross-correlation coefficients. From the plots, the dynamic 
relationship is weak between the two return series, but their contemporaneous 
correlation is statistically significant. 


Example 8.2. Consider the simple returns of monthly indexes of U.S. gov- 
ernment bonds with maturities in 30 years, 20 years, 10 years, 5 years, and 1 year. 
The data obtained from the CRSP database have 696 observations starting from 
January 1942 to December 1999. Let r; = (Fir, ..., rst)" be the return series with 
decreasing time to maturity. Figure 8.4 shows the time plots of r, on the same scale. 
The variability of the 1-year bond returns is much smaller than that of returns with 
longer maturities. The sample means and standard deviations of the data are # = 
10-? (0.43, 0.45, 0.45, 0.46, 0.44)’ and @ = 1077(2.53, 2.43, 1.97, 1.39, 0.53)’. 
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Figure 8.3 Sample auto- and cross-correlation functions (CCF) of two monthly log return series: 
(a) sample ACF of IBM stock returns, (b) cross-correlations between S&P 500 index and lagged IBM 
stock returns (lower left), (c) cross correlations between IBM stock and lagged S&P 500 index returns, 
and (d) sample ACF of S&P 500 index returns. Dashed lines denote 95% limits. 


The concurrent correlation matrix of the series is 


1.00 0.98 0.92 0.85 0.63 
0.98 1.00 0.91 0.86 0.64 
0.92 0.91 1.00 0.90 0.68 
0.85 0.86 0.90 1.00 0.82 
0.63 0.64 0.68 0.82 1.00 


d 
[=] 
II 


It is not surprising that (a) the series have high concurrent correlations, and (b) the 
correlations between long-term bonds are higher than those between short-term 
bonds. 

Table 8.2 gives the lag-1 and lag-2 cross-correlation matrices of r, and the 
corresponding simplified matrices. Most of the significant cross correlations are at 
lag 1, and the five return series appear to be intercorrelated. In addition, lag-1 and 
lag-2 sample ACFs of the 1-year bond returns are substantially higher than those 
of other series with longer maturities. 
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Figure 8.4 Time plots of monthly simple returns of five indexes of U.S. government bonds with 
maturities in (a) 30 years, (b) 20 years, (c) 10 years, (d) 5 years, and (e) 1 year. Sample period is from 
January 1942 to December 1999. 


8.1.4 Multivariate Portmanteau Tests 


The univariate Ljung—Box statistic Q (m) has been generalized to the multivariate 
case by Hosking (1980, 1981) and Li and McLeod (1981). For a multivariate 
series, the null hypothesis of the test statistic is Hp : p} =--- = Pm = 0, and the 
alternative hypothesis H, : p; #4 0 for some i € {1,..., m}. Thus, the statistic is 
used to test that there are no auto- and cross correlations in the vector series rz. 
The test statistic assumes the form 


m 


twee 
Qx(m) =T} apt To Tey), (8.7) 
l=1 


where T is the sample size, k is the dimension of r;, and tr(A) is the trace of 
the matrix A, which is the sum of the diagonal elements of A. Under the null 
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TABLE 8.2 Sample Cross-Correlation Matrices of Monthly Simple Returns of Five 
Indexes of U.S. Government Bonds: January 1942 to December 1999 


Lag 1 Lag 2 
Cross-Correlations 
0.10 0.08 0.11 0.12 0.16 —0.01 0.00 0.00 —0.03 0.03 
0.10 0.08 0.12 0.14 0.17 —0.01 0.00 0.00 —0.04 0.02 
0.09 0.08 0.09 0.13 0.18 0.01 0.01 0.01 —0.02 0.07 
0.14 0.12 0.15 0.14 0.22 —0.02 —0.01 0.00 —0.04 0.07 
0.17 0.15 0.21 0.22 0.40 —0.02 0.00 0.02 0.02 0.22 


Simplified Cross-Correlation Matrices 


+ + + 4+ + 
p + + + + 
|t ++++ 

+ + + + 


hypothesis and some regularity conditions, Q;(m) follows asymptotically a chi- 
squared distribution with k?m degrees of freedom. 


Remark. The Q;(m) statistics can be rewritten in terms of the sample cross- 
correlation matrices p;. Using the Kronecker product ® and vectorization of matri- 
ces discussed in Appendix A of this chapter, we have 


m 1 
Qm) =T $ 
4 T-é 


(Po Q Po be, 


where by = vec(p7). The test statistic proposed by Li and McLeod (1981) is 


m 


x frat Ses k?m(m + 1) 
Ox(m) =T X b, 0 ® By Dbe + ————. 
fl 


2T 


which is asymptotically equivalent to Q,(m). 


Applying the Q;(m) statistics to the bivariate monthly log returns of IBM stock 
and the S&P 500 index of Example 8.1, we have Q2(1) = 9.81, Q2(5) = 47.06, 
and Q2(10) = 71.65. Based on asymptotic chi-squared distributions with degrees 
of freedom 4, 20, and 40, the p values of these Q2(m) statistics are 0.044, 0.001, 
and 0.002, respectively. The portmanteau tests thus confirm the existence of serial 
dependence in the bivariate return series at the 5% significance level. For the five- 
dimensional monthly simple returns of bond indexes in Example 8.2, we have 
Q5(5) = 1065.63, which is highly significant compared with a chi-squared distri- 
bution with 125 degrees of freedom. 
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The Q;(m) statistic is a joint test for checking the first m cross-correlation matri- 
ces of r, being zero. If it rejects the null hypothesis, then we build a multivariate 
model for the series to study the lead—lag relationships between the component 
series. In what follows, we discuss some simple vector models useful for modeling 
the linear dynamic structure of a multivariate financial time series. 


8.2 VECTOR AUTOREGRESSIVE MODELS 


A simple vector model useful in modeling asset returns is the vector autoregressive 
(VAR) model. A multivariate time series r; is a VAR process of order 1, or VAR(1) 
for short, if it follows the model 


Ft = o + Öri + a, (8.8) 


where ġọ is a k-dimensional vector, ® is a k x k matrix, and {a;} is a sequence of 
serially uncorrelated random vectors with mean zero and covariance matrix Z. In 
application, the covariance matrix © is required to be positive definite; otherwise, 
the dimension of r, can be reduced. In the literature, it is often assumed that a; is 
multivariate normal. 

Consider the bivariate case [i.e., k = 2, r; = (riz, ror)’, and a; = (ait, a2,)’]. The 
VAR(1) model consists of the following two equations: 


rit = Qio + Piiri —1 + 1221-1 + air, 


ru = Q + Dari, -1 + B227r21-1 + ax, 


where ®;; is the (i, j)th element of ® and ġ;o is the ith element of ġo. Based on 
the first equation, ®;2 denotes the linear dependence of rı; on r2,;—; in the presence 
of rı s—1. Therefore, ®j2 is the conditional effect of rz, on ry, given ry;—-1. If 
®ı2 = 0, then rı; does not depend on r2,—1, and the model shows that rı; only 
depends on its own past. Similarly, if ®2; = 0, then the second equation shows 
that rə; does not depend on r;,;-; when r2,;—1 is given. 

Consider the two equations jointly. If ®ı2 = 0 and ®2; Æ 0, then there is a 
unidirectional relationship from rı; to ra. If ®j2 = Bz; = 0, then rı; and rz, are 
uncoupled. If ©), # 0 and ®2; Æ 0, then there is a feedback relationship between 
the two series. 


8.2.1 Reduced and Structural Forms 


In general, the coefficient matrix ® of Eq. (8.8) measures the dynamic dependence 
of r;. The concurrent relationship between rı; and rz; is shown by the off-diagonal 
element o2 of the covariance matrix ÈX of a;. If oj2 = 0, then there is no con- 
current linear relationship between the two component series. In the econometric 
literature, the VAR(1) model in Eq. (8.8) is called a reduced-form model because it 
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does not show explicitly the concurrent dependence between the component series. 
If necessary, an explicit expression involving the concurrent relationship can be 
deduced from the reduced-form model by a simple linear transformation. Because 
È is positive definite, there exists a lower triangular matrix L with unit diago- 
nal elements and a diagonal matrix G such that © = LGL’; see Appendix A on 
Cholesky decomposition. Therefore, L~!Z(L')~! = G. 

Define b, = (by,,..., by)’ = L~'a,. Then 


E(b,) = L~'E(a;) = 0, Cov(b,) = LO X(L"!) = LX(L’) ! = G. 


Since G is a diagonal matrix, the components of b, are uncorrelated. Multiplying 
L`! from the left to model (8.8), we obtain 


L'r, = L'o + L'or, + L'a, = pi + ©*r,_1 + By, (8.9) 


where ġġ = Lbs is a k-dimensional vector and ®* = L~'@ is a k x k matrix. 
Because of the special matrix structure, the kth row of L~! is in the form 


(Wki, Wk2, +++, Wk k—1, 1). Consequently, the kth equation of model (8.9) is 
k-1 k 
Tkt + 2 Weil it = Oo + ` De ri -1 + brr, (8.10) 


i=l i=l 


where ġž ġ is the kth element of @5 and ®%; is the (k, i)th element of ®*. Because 
bk: is uncorrelated with b;, for 1 < i < k, Eq. (8.10) shows explicitly the concurrent 
linear dependence of rg; on rit, where 1 < i < k — 1. This equation is referred to 
as a structural equation for rj; in the econometric literature. 

For any other component r;; of r;, we can rearrange the VAR(1) model so that 
rip becomes the last component of r;. The prior transformation method can then be 
applied to obtain a structural equation for r;;. Therefore, the reduced-form model 
(8.8) is equivalent to the structural form used in the econometric literature. In time 
series analysis, the reduced-form model is commonly used for two reasons. The 
first reason is ease in estimation. The second and main reason is that the concurrent 
correlations cannot be used in forecasting. 


Example 8.3. To illustrate the transformation from a reduced-form model to 
structural equations, consider the bivariate AR(1) model 


Fit 7 0.2 0.2 0.3 T1t-1 dit NE 2 1 
i = a = E | | T E e AS l 1 
For this particular covariance matrix &, the lower triangular matrix 


a f 10 00 
" e A 
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provides a Cholesky decomposition [i.e., L~!Z(L’)~! is a diagonal matrix]. Pre- 
multiplying L~! to the previous bivariate AR(1) model, we obtain 


1.0 0.0 ry | | 0.2 4 0.2 0.3 Chit fy by 
—0.5 1.0 ru | | 0.3 —0.7 0.95 121-1 by |’ 
2 0 
G= | 0 0.5 | , 
where G = Cov(b;). The second equation of this transformed model gives 


Fa = 0.3 + 0.5r14 = 0.771 4-1 + 0.95r2 4—1 + bz, 


which shows explicitly the linear dependence of r2; on rir. 
Rearranging the order of elements in r;, the bivariate AR(1) model becomes 


rt = 0.4 1.1 —0.6 To.4=1 At _ 1 1 
| ™ |=[ 02 |+[ 03 o2 || mc |+[2 |. teji 2]. 
The lower triangular matrix needed in the Cholesky decomposition of & becomes 


a f 10 00 
- aa 


Premultiplying L7! to the earlier rearranged VAR(1) model, we obtain 
1.0 0.0 ry _ 0.4 + 1.1 —0.6 T2,.t-1 + Cit 
—1.0 1.0 ry | | —0.2 —0.8 0.8 fit-1 Cu |’ 


where G = Cov(c;). The second equation now gives 
rit = —0.2 + 1.0rq, — 0.8r2 1—1 + 0.8r1 -1 + Cz. 
Again this equation shows explicitly the concurrent linear dependence of rı; on ra. 


8.2.2 Stationarity Condition and Moments of a VAR(1) Model 


Assume that the VAR(1) model in Eq. (8.8) is weakly stationary. Taking expectation 
of the model and using E(a,;) = 0, we obtain 


E(r,) = bo + ®E(r;_}). 
Since E(r;) is time invariant, we have 


w= E(r;) = (I — ©) | do 
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provided that the matrix Z — ® is nonsingular, where I is the k x k identity matrix. 
Using ġo = (I — ®)p, the VAR(1) model in Eq. (8.8) can be written as 


(r: — U) = O(r;_; — H) + ar. 


Let F, =r; —m be the mean-corrected time series. Then the VAR(1) model 
becomes 


F= OF.) +4). (8.11) 


This model can be used to derive properties of a VAR(1) model. By repeated 
substitutions, we can rewrite Eq. (8.11) as 


Fi =a; + arı + ©7a,_> oF a, pees 


This expression shows several characteristics of a VAR(1) process. First, since 
a, is serially uncorrelated, it follows that Cov(a;,r;—;) = 0. In fact, a; is not 
correlated with r;_¢ for all £ >0. For this reason, a; is referred to as the shock or 
innovation of the series at time t. It turns out that, similar to the univariate case, a; 
is uncorrelated with the past value r;_; (j > 0) for all time series models. Second, 
postmultiplying the expression by a’, taking expectation, and using the fact of no 
serial correlations in the a; process, we obtain Cov(r;, at) = X. Third, for a VAR(1) 
model, r; depends on the past innovation a;—; with coefficient matrix ®/. For such 
dependence to be meaningful, ®/ must converge to zero as j — oo. This means 
that the k eigenvalues of ® must be less than 1 in modulus; otherwise, ®! will 
either explode or converge to a nonzero matrix as j — oo. As a matter of fact, the 
requirement that all eigenvalues of ® are less than | in modulus is the necessary 
and sufficient condition for weak stationarity of r, provided that the covariance 
matrix of a; exists. Notice that this stationarity condition reduces to that of the 
univariate AR(1) case in which the condition is |¢| < 1. Furthermore, because 


f 1 
AI- @| =A |1- 8- |, 


the eigenvalues of ® are the inverses of the zeros of the determinant |J — ® B|. 
Thus, an equivalent sufficient and necessary condition for stationarity of r, is that all 
zeros of the determinant |®(B)| are greater than one in modulus; that is, all zeros are 
outside the unit circle in the complex plane. Fourth, using the expression, we have 


CO 
Cov(r,) =Tp = E +PEP + P E(D +... = > DESİ, 
i=0 


where it is understood that ®° = J, the k x k identity matrix. 
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Postmultiplying 7), to Eq. (8.11), taking expectation, and using the result 
Cov r:—j) = E(ař,_;) = 0 for j >0, we obtain 


EFF o) = DEFF), £>0. 
Therefore, 

Ty = Fy), £>0, (8.12) 
where I’; is the lag-j cross-covariance matrix of r;. Again this result is a general- 
ization of that of a univariate AR(1) process. By repeated substitutions, Eq. (8.12) 
shows that 

T,='To, for £>0. 


-1/2 we obtain 


Pre- and postmultiplying Eq. (8.12) by D 
pi = DST D! = DEDDF, D = Tp,_,, 
where Y = D~'/?@D"?, Consequently, the CCM of a VAR(1) model satisfies 


Pe = Yi bir for L>0. 


8.2.3 Vector AR(p) Models 


The generalization of VAR(1) to VAR(p) models is straightforward. The time series 
r, follows a VAR(p) model if it satisfies 


ri = Qo + iri +--+: + prp tar, p>0, (8.13) 


where Øo and a, are defined as before, and ®; are k x k matrices. Using the 
back-shift operator B, the VAR(p) model can be written as 


(I — B —::-— ©,B?)r, = po + a, 


where I is the k x k identity matrix. This representation can be written in a compact 
form as 


®(B)r, = oo + ar, 


where ®(B) = I — ®,B —.--—@®,B? is a matrix polynomial. If r; is weakly 
stationary, then we have 


u= E(r;) = (I — © -P p = [@(1) 11h 
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provided that the inverse exists. Let r; = r; — m. The VAR(p) model becomes 
Fi = iF i +--+ +prop +a. (8.14) 


Using this equation and the same techniques as those for VAR(1) models, we 
obtain that 


e Cov(r;, at) = Ł, the covariance matrix of a;. 
e Cov(r:—¢, a;) = 0 for £> 0. 
e T=% ++ ® ,T¢_p for £ >0. 


The last property is called the moment equations of a VAR(p) model. It is a 
multivariate version of the Yule—Walker equation of a univariate AR(p) model. In 
terms of CCM, the moment equations become 


Pe = Yipee- +++ YpPpe-p for €>0, 


where Y; = D~'/?@; D. 

A simple approach to understanding properties of the VAR(p) model in Eq. 
(8.13) is to make use of the results of the VAR(1) model in Eq. (8.8). This can be 
achieved by transforming the VAR(p) model of r; into a kp-dimensional VAR(1) 
model. Specifically, let x; = (F; p41» Fj—p42> -+ -> Fp) and b, = (0,..., 0, a4)’ be 
two kp-dimensional processes. The mean of b, is zero and the covariance matrix 
of b, is a kp x kp matrix with zero everywhere except for the lower right corner, 
which is &. The VAR(p) model for r; can then be written in the form 


x, = O'x,14+b;, (8.15) 


where ®* is a kp x kp matrix given by 


0 I 0 0 + 0 

0 0 I 0 + 0 
=j: i | sor 

0 0 0 0 œ I 

®, pı p2 ®, 3 --- ®ı 


where 0 and I are the k x k zero matrix and identity matrix, respectively. In 
the literature, ®* is called the companion matrix of the matrix polynomial 
®(B). 

Equation (8.15) is a VAR(1) model for x+, which contains r, as its last k com- 
ponents. The results of a VAR(1) model shown in the previous section can now be 
used to derive properties of the VAR(p) model via Eq. (8.15). For example, from 
the definition, x; is weakly stationary if and only if r; is weakly stationary. There- 
fore, the necessary and sufficient condition of weak stationarity for the VAR(p) 
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model in Eq. (8.13) is that all eigenvalues of ®* in Eq. (8.15) are less than 1 in 
modulus. It is easy to show that |Z — ®*B| = |®(B)|. Therefore, similar to the 
VAR(1) case, the necessary and sufficient condition is equivalent to all zeros of 
the determinant |®(B)| being outside the unit circle. 

Of particular relevance to financial time series analysis is the structure of the 
coefficient matrices ®; of a VAR(p) model. For instance, if the (i, 7)th element 
®;; (£) of ®; is zero for all £, then r;; does not depend on the past values of r jz. The 
structure of the coefficient matrices ®, thus provides information on the lead-lag 
relationship between the components of rz. 


8.2.4 Building a VAR(p) Model 


We continue to use the iterative procedure of order specification, estimation, and 
model checking to build a vector AR model for a given time series. The concept of 
partial autocorrelation function of a univariate series can be generalized to specify 
the order p of a vector series. Consider the following consecutive VAR models: 


rr = Qo + Piri +a 
ri = Po + Piri + Pri +4; 


ri = Qo + irii +e + Biri +a (8.16) 


Parameters of these models can be estimated by the ordinary least-squares (OLS) 
method. This is called the multivariate linear regression estimation in multivariate 
statistical analysis; see Johnson and aeii (1998). 

For the ith equation in Eq. (8.16), let Si ? be the OLS estimate of ®; and by 
be the estimate of o, where the suipetsoripl (i) is used to denote that the. estimates 
are for a VAR(i) model. Then the residual is 


~(i = (i) A 
a= -o = Q] Tr St = OF fi: 


For i = 0, the residual is defined as r PO = r; — r, where r is the sample mean of 


. The residual covariance matrix is defina as 


T 
< 1 
È = Y Pay j> 0. 8.17 
T= A mene ae a 


To specify the order p, one can test the hypothesis Ho : ®; = 0 versus the alter- 
native hypothesis Ha : ®; # 0 sequentially for £ = 1, 2, .... For example, using 
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the first equation in Eq. (8.16), we can test the hypothesis Ho : ®; = 0 versus the 
alternative hypothesis Ha : ®; Æ 0. The test statistic is 


may=-(7-k-3)n eh, 
2 |Zo| 


where È; is defined in Eq. (8.17) and |A| denotes the determinant of the matrix 
A. Under some regularity conditions, the test statistic M (1) is asymptotically a 
chi-squared distribution with k? degrees of freedom; see Tiao and Box (1981). 

In general, we use the ith and (i — 1)th equations in Eq. (8.16) to test Hp : ®; = 
0 versus Ha : ®; Æ 0; that is, testing a VAR(i) model versus a VAR(i — 1) model. 
The test statistic is 


MO=-(r-k-i-2)m Li (8.18) 
2 Eal l 


Asymptotically, M (i) is distributed as a chi-squared distribution with k? degrees 
of freedom. 

Alternatively, one can use the Akaike information criterion (AIC) or its variants 
to select the order p. Assume that a, is multivariate normal and consider the ith 
equation in Eq. (8.16). One can estimate the model by the maximum -likelihood 
(ML) method. For AR models, the OLS estimates @g and ® j are equivalent to the 
(conditional) ML estimates. However, there are differences between the estimates 
of £. The ML estimate of X is 


T 
= 1 ai i)y 
È, = T AT (8.19) 


The AIC of a VAR() model under the normality assumption is defined as 
, E 2k?i 
AIC(i) = In(|£;|) + TT 


For a given vector time series, one selects the AR order p such that AIC(p) = 
mino<i<po AIC(i), where po is a prespecified positive integer. 
Other information criteria available for VAR(Z) models are 


2 k?i In(T) 
BIC(i) = In(|£;|) + a ` 
` 2k?i Infl 
nags niih s 


The HQ criterion is proposed by Hannan and Quinn (1979). 
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Example 8.4. Assuming that the bivariate series of monthly log returns of IBM 
stock and the S&P 500 index discussed in Example 8.1 follows a VAR model, we 
apply the M (i) statistics and AIC to the data. Table 8.3 shows the results of these 
statistics. Both statistics indicate that a VAR(5) model might be adequate for the 
data. The M(i) statistics are marginally significant at lags 1, 3, and 5 at the 5% 
level. The minimum of AIC occurs at order 5. For this particular instance, the M (i) 
statistic is only marginally significant at the 1% level when i = 2, confirming the 
previous observation that the dynamic linear dependence between the two return 
series is weak. 


Estimation and Model Checking 

For a specified VAR model, one can estimate the parameters using either the OLS 
method or the ML method. The two methods are asymptotically equivalent. Under 
some regularity conditions, the estimates are asymptotically normal; see Reinsel 
(1993). A fitted model should then be checked carefully for any possible inad- 
equacy. The Q;(m) statistic can be applied to the residual series to check the 
assumption that there are no serial or cross correlations in the residuals. For a 
fitted VAR(p) model, the Q;(m) statistic of the residuals is asymptotically a chi- 
squared distribution with k?m — g degrees of freedom, where g is the number of 
estimated parameters in the AR coefficient matrices; see Liitkepohl (2005). 


Example 8.4 (Continued). Table 8.4(a) shows the estimation results of a 
VAR(5) model for the bivariate series of monthly log returns of IBM stock and the 
S&P 500 index. The specified model is in the form 


ri = oo + Piri + Orr + O3r+-3 + srs +a, (8.20) 


where the first component of r; denotes IBM stock returns. For this particular 
instance, we do not use AR coefficient matrix at lag 4 because of the weak serial 
dependence of the data. In general, when the M (i) statistics and the AIC criterion 
specify a VAR(5) model, all five AR lags should be used. Table 8.4(b) shows the 
estimation results after some statistically insignificant parameters are set to zero. 
The Q;(m) statistics of the residual series for the fitted model in Table 8.4(b) 
give Q2(4) = 16.64 and Q2(8) = 31.55. Since the fitted VAR(5) model has six 
parameters in the AR coefficient matrices, these two Q;(m) statistics are distributed 
asymptotically as a chi-squared distribution with degrees of freedom 10 and 26, 


TABLE 8.3 Order Specification Statistics for Monthly Log Returns of IBM Stock 
and S&P 500 Index from January 1926 to December 20087 


Order 1 2 3 4 5 6 
M(i) 10.76 13.41 10.34 7.78 12.07 1.93 
AIC 6.795 6.789 6.786 6.786 6.782 6.788 


“The 5% and 1% critical values of a chi-squared distribution with 4 degrees of freedom are 9.5 and 13.3. 
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respectively. The p-values of the test statistics are 0.083 and 0.208, and hence the 
fitted model is adequate at the 5% significance level. As shown by the univariate 
analysis, the return series are likely to have conditional heteroscedasticity. We 
discuss multivariate volatility in Chapter 10. 

From the fitted model in Table 8.4(b), we make the following observations: 
(a) The concurrent correlation coefficient between the two innovational series is 
24/4/48 x 30 = 0.63, which, as expected, is close to the sample correlation coeffi- 
cient between rı; and rz. (b) The two log return series have positive and significant 
means, implying that the log prices of the two series had an upward trend over the 
data span. (c) The model shows that 


IBM, = 1.0 + 0.13SP5;-1 — 0.09SP5;-2 + 0.09SP5;—-5 + air, 

SP5, = 0.4 + 0.08SP5;_; — 0.06SP5;-3 + 0.09SP5;_5 + ax. 
Consequently, at the 5% significance level, there is a unidirectional dynamic rela- 
tionship from the monthly S&P 500 index return to the IBM return. If the S&P 
500 index represents the U.S. stock market, then IBM return is affected by the past 
movements of the market. However, past movements of IBM stock returns do not 


significantly affect the U.S. market, even though the two returns have substantial 
concurrent correlation. Finally, the fitted model can be written as 


IBM, ]_ [ 1.0 0.13 0.09 0 
| SP5, |- | 0.4 ral 0.08 ]sP5- -| 0 ]sP52-| ne | sp 
0.09 di 
+ | a ]sPss+| T h 


indicating that SP5, is the driving factor of the bivariate series. 


TABLE 8.4 Estimation Results of a VAR(5) Model for the Monthly Log Returns, in 
Percentages, of IBM Stock and S&P 500 Index from January 1926 to December 2008 


amel a) a | a [| ®% [| a [= 


(a) Full Model 


Estimate 24 
30 
Standard 
error 
Estimate 24 
30 


Standard 
error 
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Forecasting 

Treating a properly built model as the true model, one can apply the same techniques 
as those in the univariate analysis to produce forecasts and standard deviations of 
the associated forecast errors. For a VAR(p) model, the 1-step-ahead forecast at the 
time origin h is r,(1) = o + ye ®;rn+1—i, and the associated forecast error is 
e;(1) = an+1. The covariance matrix of the forecast error is X. For 2-step-ahead 
forecasts, we substitute r;,4, by its forecast to obtain 


P 


rn(2) = $0 + Pira (1) + Y Dirnt, 


i=2 


and the associated forecast error is 
€n(2) = Ange + ilr: — ra(1)] = an2 + Py ans. 


The covariance matrix of the forecast error is £ + ® 1 uO}. If r; is weakly sta- 
tionary, then the ¢-step-ahead forecast r,(£) converges to its mean vector m as 
the forecast horizon £ increases and the covariance matrix of its forecast error 
converges to the covariance matrix of r;. 

Table 8.5 provides 1-step- to 6-step-ahead forecasts of the monthly log returns, 
in percentages, of IBM stock and the S&P 500 index at the forecast origin h = 
996. These forecasts are obtained by the refined VAR(5) model in Table 8.4(b). 
As expected, the standard errors of the forecasts converge to the sample standard 
errors 7.03 and 5.53, respectively, for the two log return series. 

In summary, building a VAR model involves three steps: (a) Use the test statistic 
M(i) or some information criterion to identify the order, (b) estimate the specified 
model by using the least-squares method and, if necessary, reestimate the model 
by removing statistically insignificant parameters, and (c) use the Q,(m) statistic 
of the residuals to check the adequacy of a fitted model. Other characteristics of 
the residual series, such as conditional heteroscedasticity and outliers, can also be 
checked. If the fitted model is adequate, then it can be used to obtain forecasts and 
make inference concerning the dynamic relationship between the variables. 

We used SCA to perform the analysis in this section. The commands used include 
miden, mtsm, mest, and mfore, where the prefix m stands for multivariate. Details 
of the commands and output are shown below. 


TABLE 8.5 Forecasts of a VAR(5) Model for Monthly Log Returns, in Percentages, 
of IBM Stock and S&P 500 Index: Forecast Origin Is December 2008 


Step 1 2 3 4 5 6 

IBM forecast 1.95 0.30 —0.82 0.14 1.16 1.29 
Standard error 6.95 6.99 7.00 7.00 7.00 7.00 
SP forecast 1.70 0.17 —1.26 —0.49 0.41 0.65 
Standard error 5.48 5.50 5.50 5:51 5.51 5.53 
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SCA Demonstration 
Output has been edited and % denotes explanation in the following: 


input date, ibm, sp5. file ‘m-ibmsp2608.txt’. 
--% compute percentage log returns. 
ibm=1n (ibm+1) *100 


sp5=I1n(sp5+1)*100 


--% model identification 
miden ibm,sp5. arfits 1 to 12. 


TIME PERIOD ANALYZED ............ #1 TO 996 
EFFECTIVE NUMBER OF OBSERVATIONS (NOBE). . . 996 
SERIES NAME MEAN STD. ERROR 

1 IBM T0891 7.0298 

2 SP5 0.4301 5.5346 


NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELA- 
TIONS BELOW 
IS (1/NOBE**.5) = 0.03169 


SAMPLE CORRELATION MATRIX OF THE SERIES 


1.00 
0.65 1.00 
SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE 


+ DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) 
- DENOTES A VALUE LESS THAN -2/SQRT(NOBE) 
DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE 


CRITERION 
CROSS CORRELATION MATRICES IN TERMS OF +,-,. 
LAGS 1 THROUGH 6 
Fg a ey E as? AE 
a eee = E 
LAGS 7 THROUGH 12 
+ 
+ + 


======== STEPWISE AUTOREGRESSION SUMMARY ======== 
I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGN. 
LAG I VARIANCESI OF SIGMA I TEST I AIC I PAR. AR 
----+---------- +---------- +--------- +---------- +-------------- 
1 I .492E+02 I .133E+02 I 10.76 I 62795. T n $ 
I .306E+02 I .665E+02 I I I. + 
----+---------- +---------- +--------- +---------- +-------------- 
2 I .486E+02 I .133E+02 I 13.41 I 6.789 I + - 
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I .306E+02 I .659E+02 
Saree ee nae P TORENA 
3 I .484E+02 I .132E+02 

I .303E+02 I .655E+02 
SE ade F OE 
4 I .484E+02 I .131E+02 

I .302E+02 I .655E+02 
ra Acoma eaa 

5 I .480E+02 I .131E+02 

I .299E+02 I .648E+02 
E ee Pe 

6 I .479E+02 I .131E+02 

I .298E+02 I .647E+02 
o E Teee 

7 I .479E+02 I .130E+02 

I .298E+02 I .647E+02 
PEENE P 

8 I .477E+02 I .130E+02 

I .296E+02 I .643E+02 
E ae awe Fedean 

9 I .476E+02 I .130E+02 

I .295E+02 I .642E+02 
E RO PE E S 

10 I .476E+02 I .130E+02 

I .295E+02 I .641E+02 
SEN ES E FANDERIA 

11 I .475E+02 I .130E+02 

I .294E+02 I .640E+02 
a Pesan PEA 

12 I .475E+02 I .129E+02 

I .294E+02 I .640E+02 
ey See PR 


E:CHI-SQUA 


R] 


ED CRITICAL 


5 P 


ERC 


ENT: 9.5 


= 9 


mtsm m1. 


sp5. 


model @ 


ES WITH 4 D 
1 PERCENT: 
% model specification of a VAR(5) model without lag 4. 
series ibm, 
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13:3 


(i-p2*b-p2*b**2-p3*b**3-p5*b**5)series=c+noise. 


-- % estimation 
mestim m1. hold resi(r1,r2). 
-- % 
p2(2,2)=0 
ep2 (2; 2)=1 
p3(1,2)=0 
ep3 (1,2) =2 


mestim m1. 


hold resi(r1,r2) 


demonstration of setting zero constraint 


EDOM ARE 
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FINAL MODI 
CONSTANT VECTOR 


Tvs 


039 ( 


0.390 ( 


PHI MATRICES 


ESTIMAT 


ES OF 


STANDAR: 


.000 
.000 


D ERRORS 


ESTIMAT 


ES OF 


STANDAR: 


.000 
.000 


D ERRORS 


ESTIMAT 


ES OF 


STANDAR: 


.000 
.000 


D ERRORS 


ESTIMAT 


ES OF 


STANDAR: 


.000 
.000 


D ERRORS 


EL SUMMARY WITH CONDITIONAL LIKELIHOOD PAR. 
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[za] 
[cp] 
a 


(STD 
0223- `) 
0.176 ) 


ERROR) 


MATRIX AND SIGNIFICANCE 
+ 
+ 


MATRIX AND SIGNIFICANCE 


3 ) MATRIX AND SIGNIFICANCE 


MATRIX AND SIGNIFICANCE 
+ 
+ 


48.328570 


24.361464 30.027406 


-- $% 


compute 
miden r1,r2. 


o 
= 0: 


residual cross-correlation matrices 
maxl 12. 


prediction 


mfore m1. 


FORI 


nofs 6. 


SP5 
FORECAST STD ERR 
1.698 5.480 


IBM 
ECAST STD ERR 
6.952 


1.954 
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998 0.304 6.988 0.173 5.497 
999 -0.815 7.001 =1..263 5.497 
1000 0.138 7.001 -0.494 5:507 
1001 1:162 7.002 0.408 5.508 
1002 1.294 7.022 0.649 5.528 


8.2.5 Impulse Response Function 


Similar to the univariate case, a VAR(p) model can be written as a linear function 
of the past innovations, that is, 


r, =U +4; + Viani + Va +-->, (8.21) 


where u = [®(1)]-'¢ provided that the inverse exists, and the coefficient matrices 
yw, can be obtained by equating the coefficients of B’ in the equation 


(I — ®,B—---— ©, B’)\1+V,B+ WB? +---)=T, 


where J is the identity matrix. This is a moving-average representation of r, with 
the coefficient matrix W; being the impact of the past innovation a;_; on r;. Equiv- 
alently, W; is the effect of a; on the future observation r;+;. Therefore, W; is often 
referred to as the impulse response function of r,. However, since the components 
of a; are often correlated, the interpretation of elements in W; of Eq. (8.21) is 
not straightforward. To aid interpretation, one can use the Cholesky decomposition 
mentioned earlier to transform the innovations so that the resulting components 
are uncorrelated. Specifically, there exists a lower triangular matrix L such that 
x = LGL', where G is a diagonal matrix and the diagonal elements of L are 
unity. See Eq. (8.9). Let b, = L~'a;. Then, Cov(b;) = G so that the elements bjt 
are uncorrelated. Rewrite Eq. (8.21) as 


r: = p +a; + Viani + Para +- 
=p+LL'a, + YLL a + WLL 'a 2+- 
= p + Wb, + Vb, + Yžb2+---, (8.22) 


where Y = L and Y*¥ = W;L. The coefficient matrices WF are called the impulse 
response function of r, with respect to the orthogonal innovations b,. Specifically, 
the (i, j)th element of W7; that is, Yi (£), is the impact of b;, on the future 
observation 7;,;+¢. In practice, one can further normalize the orthogonal innovation 
b, such that the variance of bj; is one. A weakness of the above orthogonalization 
is that the result depends on the ordering of the components of r,. In particular, 
bi; = ay so that aj; is not transformed. Different orderings of the components of 
r, may lead to different impulse response functions. Interpretation of the impulse 
response function is, therefore, associated with the innovation series b;. 

Both SCA and S-Plus enable one to obtain the impulse response function of a 
fitted VAR model. To demonstrate analysis of VAR models in S-Plus, we again use 
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the monthly log return series of IBM stock and the S&P 500 index of Example 
8.1. For details of S-Plus commands, see Zivot and Wang (2003). 


S-Plus Demonstration 
The following output has been edited and % denotes explanation: 


> module(finmetrics) 
> da=read.table("m-ibmsp2608.txt",header=T) % Load data 
> ibm=log(da[,2]+1)*100 % Compute percentage log returns 
> sp5=log(da[,3]+1)*100 
> y=cbind(ibm,sp5) % Create a vector series 
> yl=data.frame(y) % Crate a data frame 
> ord.choice=VAR(y1l,max.ar=10) % Order selection using BIC 
> names (ord.choice) 
[L] "R" "coef" "fitted" "residuals" "Sigma" "df.resid" 
[7] "rank" "call" "ar.order" "n.na" "terms" "Y0" 
[13] "info" 
> ord.choiceSar.order % selected order 
[i] 2 
> ord.choicesSinfo 
ar (1) ar (2) ar (3) ar (4) ar (5) ar (6) 
BIC 12325.41 12339.42 12356.58 12376.28 12391.57 12417.2 
ar (7) ar (8) ar (9) ar (10) 
BIC 12442.03 12462.5212484.78 12510.91 


> ord=VAR(y1l,max.ar=10,criterion='’AIC’) % Using AIC 
> ordSar.order 


[1] 5 
> ordsSinfo 
ar (1) ar (2) ar (3) ar (4) ar (5) ar (6) 
AIC 12296.04 12290.48 12288.07 12288.2 12283.91 12289.96 
ar (7) ar (8) ar (9) ar (10) 


AIC 12295.22 12296.13 12298.82 12305.37 


The AIC selects a VAR(5) model as before, but BIC selects a VAR(1) model. 
For simplicity, we shall use VAR(1) specification in the demonstration. Note that 
different normalizations are used between the two packages so that the values of 
information criteria appear to be different; see the AIC in Table 8.3. This is not 
important because normalization does not affect order selection. Turn to estimation. 


> varl.fit=VAR(y~var(1)) % Estimate a VAR(1) model 
> summary (vari. fit) 


Call: 
VAR (formula = y ~ ar(1)) 
Coefficients: 

ibm sp5 
(Intercept) 1.0614 0.4087 


(std.err) 0.2249 0.1773 
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(t.stat) 4.7198 2.3053 


ibm.lagl -0.0320 -0.0223 
(std.err) 0.0413 0.0326 
(t.stat) -0.7728 -0.6855 


sp5.lagl 0.1503 0.1020 
(std.err) 0.0525 0.0414 
(t.stat) 2.8612 2.4637 


Regression Diagnostics: 
ibm sp5 
R-squared 0.0101 0.0075 
Adj. R-squared 0.0081 0.0055 
Resid. Scale 7.0078 5.5247 


lo) 


Information Criteria: 
logL AIC BIC 


HQ 


-6193.988 12399.977 12429.393 12411.159 


total residual 
Degree of freedom: 995 992 


> plot(varl.fit) 


Make a plot selection (or 0 to exit): 


1: plot: All 
2: plot: Response and Fitted Values 
3: plot: Residuals 


8: plot: PACF of Squared Residuals 
Selection: 3 


The fitted model is 


IBM, = 1.06 — 0.03IBM;—; + 0.15SP5;_1 + air, 
SP5, = 0.41 — 0.02IBM;_; + 0.10SP5;_1 + ax. 
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Based on f statistics of the coefficient estimates, only the lagged variable SP5,_; is 
informative in both equations. Figure 8.5 shows the time plots of the two residual 
series, where the two horizontal lines indicate the two standard error limits. As 


expected, there exist clusters of outlying observations. 


Next, we compute l-step- to 6-step-ahead forecasts and the impulse response 
function of the fitted VAR(1) model when the IBM stock return is the first com- 
ponent of r;. Compared with those of a VAR(5) model in Table 8.5, the forecasts 


of the VAR(1) model converge faster to the sample mean of the series. 


> varl.pred=predict (varl.fit,n.predict=6) 


> summary (varl.pred) 


% 


Compute forecasts 
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Figure 8.5 Residual plots of fitting a VAR(1) model to the monthly log returns, in percentages, of 
IBM stock and S&P 500 index. Sample period is from January 1926 to December 2008. 


Predicted Values with 


1-step-ahead 1. 
(std.err) 7. 
2-step-ahead 1. 
(std.err) 7. 
3-step-ahead 1. 
(std.err) 7. 


6-step-ahead 1. 
(std.err) 7. 


ibm 
0798 
0078 
0899 
0434 
0908 
0436 


0909 
0436 


J O Vi O U G 


5 


Standard Errors: 


sp5 
.4192 
.5247 
.4274 
5453 
. 4280 
.5454 


.4280 
.5454 


> plot (var1.pred,y,n.old=12) % Obtain forecast plot 
% Below is to compute the impulse response function 
> varl.irf=impRes(varl.fit,period=6,std.err='asymptotic’) 


> summary (var1. 


irf) 


Impulse Response Function: 


(with responses in rows, 


, , lag.0 
ibm sp5 
ibm 6.9973 0.0000 
(std.err) 0.1569 0.0000 
sp5 3.5432 4.2280 
(std.err) 0.1558 0.0948 
r , Lag.1 
ibm sp5 
ibm 0.3088 0.6353 


(std.err) 0.2217 0.2221 


and innovations in columns) 
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Figure 8.6 Forecasting plots of fitted VAR(1) model to monthly log returns, in percentages, of IBM 
stock and S&P 500 index. Sample period is from January 1926 to December 2008. 


sp5 0.2050 0.4312 
(std.err) 0.1746 0.1750 


> plot(varl.irf) 


Figure 8.6 shows the forecasts and their pointwise 95% confidence intervals 
along with the last 12 data points of the series. Figure 8.7 shows the impulse 
response functions of the fitted VAR(1) model where the IBM stock return is the 
first component of r;. Since the dynamic dependence of the returns is weak, the 
impulse response functions exhibit simple patterns and decay quickly. 


8.3 VECTOR MOVING-AVERAGE MODELS 
A vector moving-average model of order g, or VMA(q), is in the form 

ri = ĝo + ar — Olari —---—Oga;-q or ri = 0o + O(B)a;, (8.23) 
where ĝo is a k-dimensional vector, @; are k x k matrices, and O(B) = I — 


©ıB —.--—©,B4 is the MA matrix polynomial in the back-shift operator B. 
Similar to the univariate case, VMA(q) processes are weakly stationary provided 
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Figure 8.7 Plots of impulse response functions of orthogonal innovations for fitted VAR(1) model to 
monthly log returns, in percentages, of IBM stock and S&P 500 index. Sample period is from January 
1926 to December 2008. 


that the covariance matrix £ of a; exists. Taking expectation of Eq. (8.23), we 
obtain that u = E(r;) = 00. Thus, the constant vector 00 is the mean vector of r; 
for a VMA model. 

Let F; = r; — 90 be the mean-corrected VAR(q) process. Then using Eq. (8.23) 
and the fact that {a;} has no serial correlations, we have 


1. Cov(r;,a;) = È. 

2. To = E + 0,20), +---+ O, 20%. 

3. Te=0if£>q. 

4. T= Li OjO if 1 < £ < q, where Oo = —I. 


Since T = 0 for £ >q, the cross-correlation matrices (CCMs) of a VMA(q) pro- 
cess r; Satisfy 


p, = 9, L>q. (8.24) 


Therefore, similar to the univariate case, the sample CCMs can be used to identify 
the order of a VMA process. 
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To better understand the VMA processes, let us consider the bivariate MA(1) 
model 


ri = ĝo + a; — Oa,- = w+ a; — Oasi, (8.25) 


where, for simplicity, the subscript of ©; is removed. This model can be written 
explicitly as 


Tir Hı air On On a1,t—1 
| ror | 7 | u2 | = | az | | O21 O2 | | a2,1-1 | i (920) 
It says that the current return series r; only depends on the current and past shocks. 
Therefore, the model is a finite-memory model. 

Consider the equation for rı; in Eq. (8.26). The parameter ©;2 denotes the linear 
dependence of rı; on a2,;—; in the presence of a;,;-1. If O12 = 0, then rı; does not 
depend on the lagged values of a; and, hence, the lagged values of rz;. Similarly, 
if ©; = 0, then rp, does not depend on the past values of r;,. The off-diagonal 
elements of © thus show the dynamic dependence between the component series. 


For this simple VMA(1) model, we can classify the relationships between rı; and 
Tra; as follows: 


1. They are uncoupled series if ©12 = ©2; = 0. 

2. There is a unidirectional dynamic relationship from rj; to rz; if Olz = 0, 
but ©; 4 0. The opposite unidirectional relationship holds if ©2; = 0, but 
Onr £0. 

3. There is a feedback relationship between rı; and rz; if ©12 Æ 0 and ©; Æ 0. 


Finally, the concurrent correlation between r;, is the same as that between a;;. The 
previous classification can be generalized to a VMA(q) model. 


Estimation 

Unlike the VAR models, estimation of VMA models is much more involved; 
see Hillmer and Tiao (1979), Liitkepohl (2005), and the references therein. 
For the likelihood approach, there are two methods available. The first is the 
conditional-likelihood method that assumes that a, = 0 for t < 0. The second is 
the exact-likelihood method that treats a, with t < O as additional parameters of 
the model. To gain some insight into the problem of estimation, we consider the 
VMA(1) model in Eq. (8.25). Suppose that the data are {r;|t = 1,..., T} and a; 
is multivariate normal. For a VMA(1) model, the data depend on ag. 


Conditional MLE 

The conditional-likelihood method assumes that aj = 0. Under such an assumption 
and rewriting the model as a; = r; — 99 + Oa;_;, we can compute the shock ar 
recursively as 


a, =r; — 9%, a2 = r2 — 09+ Oa, 
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Consequently, the likelihood function of the data becomes 


T 
1 Loo 
fri, ...,rr|0o, o, D = [] om (3 tar), 


t=1 


which can be evaluated to obtain the parameter estimates. 


Exact MLE 

For the exact-likelihood method, ao is an unknown vector that must be estimated 
from the data to evaluate the likelihood function. For simplicity, let F; = r; — 0o 
be the mean-corrected series. Using 7, and Eq. (8.25), we have 


a; = F: + Oa,_}. (8.27) 
By repeated substitutions, ao is related to all 7, as 


a, =r, + Oa, 
a = ř2 + Oa, =7.+ OF | + O7ap, 

(8.28) 
ar =řr + Ořr-ı +---+O07 'F, +O" ap. 


Thus, ao is a linear function of the data if 0ọ and © are given. This result enables 
us to estimate ao using the data and initial estimates of #9 and ©. More specifically, 
given ĝo, ©, and the data, we can define 


r=F,+@07,1+---+O°'F,, for +=1,2,...,T. 


Equation (8.28) can then be rewritten as 


ri = —Oao + a), 
2 

ry = —O’an+ a2, 

ry = —O" ao +ar. 


This is in the form of a multiple linear regression with parameter vector ao, even 
though the covariance matrix Z of a, may not be a diagonal matrix. If initial 
estimate of X is also available, one can premultiply each equation of the prior 
system by &~!/?, which is the square root matrix of X. The resulting system is 
indeed a multiple linear regression, and the ordinary least-squares method can be 
used to obtain an estimate of aj. Denote the estimate by @p. 
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Using the estimate @, we can compute the shocks a; recursively as 
ai =r; — bo + Oa, a = r2 — 09+ Oa, 


This recursion is a linear transformation from (ao, r1,..., rr) to (ao, @1,..., ar), 
from which we can (a) obtain the joint distribution of ag and the data, and (2) 
integrate out ao to derive the exact-likelihood function of the data. The resulting 
likelihood function can then be evaluated to obtain the exact ML estimates. For 
details, see Hillmer and Tiao (1979). 

In summary, the exact-likelihood method works as follows. Given initial esti- 
mates of 09, ©, and XZ, one uses Eq. (8.28) to derive an estimate of ao. This 
estimate is in turn used to compute a; recursively using Eq. (8.27) and starting 
with a; = 7, + @ap. The resulting {a}, are then used to evaluate the exact- 
likelihood function of the data to update the estimates of 69, ©, and £. The whole 
process is then repeated until the estimates converge. This iterative method to 
evaluate the exact-likelihood function applies to the general VMA(q) models. 

From the previous discussion, the exact-likelihood method requires more inten- 
sive computation than the conditional-likelihood approach does. But it provides 
more accurate parameter estimates, especially when some eigenvalues of © are 
close to 1 in modulus. Hillmer and Tiao (1979) provide some comparison between 
the conditional- and exact-likelihood estimations of VMA models. In multivariate 
time series analysis, the exact maximum-likelihood method becomes important if 
one suspects that the data might have been overdifferenced. Overdifferencing may 
occur in many situations (e.g., differencing individual components of a cointegrated 
system; see discussion later on cointegration). 

In summary, building a VMA model involves three steps: (a) Use the sample 
cross-correlation matrices to specify the order g—for a VMA(q) model, p; = 0 for 
£ >q; (b) estimate the specified model by using either the conditional- or exact- 
likelihood method—the exact method is preferred when the sample size is not 
large; and (c) the fitted model should be checked for adequacy [e.g., applying the 
Q;(m) statistics to the residual series]. Finally, forecasts of a VMA model can be 
obtained by using the same procedure as a univariate MA model. 


Example 8.5. Consider again the bivariate series of monthly log returns in 
percentages of IBM stock and the S&P 500 index from January 1926 to Decem- 
ber 2008. Since significant cross correlations occur mainly at lags 1, 2, 3 and 5, 
we employ the VMA(5) model 


r; = 09 + a; — Oia: — Ora;_-2 — O3a;-3 — Osay_s (8.29) 


for the data. Table 8.6 shows the estimation results of the model. The Q;(m) 
statistics for the residuals of the simplified model give Q2(4) = 16.00 and Q2(8) = 
29.46. Compared with chi-squared distributions with 10 and 26 degrees of freedom, 
the p values of these statistics are 0.10 and 0.291, respectively. Thus, the model 
is adequate at the 5% significance level. 
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TABLE 8.6 Estimation Results for Monthly Log Returns of IBM Stock and S&P 
500 Index Using the Vector Moving-Average Model in Eq. (8.29)* 


Parameter |a) o | œo | œo | w 


(a) Full Model with Conditional-Likelihood Method 


Estimate —0.15 
—0.15 

Standard error 0.05 
0.04 

Estimate —0.15 
—0.15 

Standard error 0.05 
0.04 

Estimate —0.10 
—0.09 

Standard error 0.04 
0.03 


“The sample period is from January 1926 to December 2008. The residual covariance matrix is not 
shown as it is similar to that in Table 8.4 


From Table 8.6, we make the following observations: 


1. The difference between conditional- and exact-likelihood estimates is small 
for this particular example. This is not surprising because the sample size is 
not small and, more important, the dynamic structure of the data is weak. 

2. The VMA(5) model provides essentially the same dynamic relationship for 
the series as that of the VAR(5) model in Example 8.4. The monthly log 
return of IBM stock depends on the previous returns of the S&P 500 index. 
The market return, in contrast, does not depend on lagged returns of IBM 
stock. In other words, the dynamic structure of the data is driven by the 
market return, not by the IBM return. The concurrent correlation between 
the two returns remains strong, however. 


8.4 VECTOR ARMA MODELS 


Univariate ARMA models can also be generalized to handle vector time series. The 
resulting models are called VARMA models. The generalization, however, encoun- 
ters some new issues that do not occur in developing VAR and VMA models. One 
of the issues is the identifiability problem. Unlike the univariate ARMA models, 
VARMA models may not be uniquely defined. For example, the VMA(1) model 


rie |_| ar |_| O 2 a1,1-1 
roy a2 0 0 2,11 
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is identical to the VAR(1) model 


rip |_| 0 -2 rini |_| ar 

rat 0 0 r2,t—1 ax | 
The equivalence of the two models can easily be seen by examining their compo- 
nent models. For the VMA(1) model, we have 


Fit = 4 — 2424-1, ru = ax. 
For the VAR(1) model, the equations are 
Fit + 2r2,t-1 = at, Tu = a2. 


From the model for r2;, we have r2;—1 = a2,,—-1. Therefore, the models for rı; are 
identical. This type of identifiability problem is harmless because either model can 
be used in a real application. 

Another type of identifiability problem is more troublesome. Consider the 
VARMA(1,1) model 


Fit 0.8 —2 Fi t=íÍ _ dit = —0.5 0 ai t—1 
ra 0 o0 ni-i | | ax 0 0 at- |` 


This model is identical to the VARMA(1,1) model 


PL e aliah e ales] 
rar 0 w ror-1 | | ax 0 w “ai |? 
for any nonzero w and yn. In this particular instance, the equivalence occurs 
because we have r2; = ax in both models. The effects of the parameters w and 
n on the system cancel out between AR and MA parts of the second model. 
Such an identifiability problem is serious because, without proper constraints, the 
likelihood function of a vector ARMA(1,1) model for the data is not uniquely 
defined, resulting in a situation similar to the exact multicollinearity in a regression 
analysis. This type of identifiability problem can occur in a vector model even if 
none of the components is a white noise series. 

These two simple examples highlight the new issues involved in the general- 
ization to VARMA models. Building a VARMA model for a given data set thus 
requires some attention. In the time series literature, methods of structural specifi- 
cation have been proposed to overcome the identifiability problem; see Tiao and 
Tsay (1989), Tsay (1991), and the references therein. We do not discuss the detail of 
structural specification here because VAR and VMA models are sufficient in most 
financial applications. When VARMA models are used, only lower order models 
are entertained [e.g., a VARMA(1,1) or VARMA(2,1) model] especially when the 
time series involved are not seasonal. 
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A VARMA(p, q) model can be written as 
O(B)r; = po + O(B)az, 


where ®(B) = I — ®ı B —---— ©, B?” and O(B) = I—O,B—.---— O,B4 are 
two k x k matrix polynomials. We assume that the two matrix polynomials have 
no left common factors; otherwise, the model can be simplified. The necessary 
and sufficient condition of weak stationarity for r; is the same as that for the 
VAR(p) model with matrix polynomial ®(B). For v>0, the (i, j)th elements 
of the coefficient matrices ®, and ©, measure the linear dependence of rı, on 
Tj t—v and aj,;~y, respectively. If the (i, j)th element is zero for all AR and MA 
coefficient matrices, then r;; does not depend on the lagged values of r;;. However, 
the converse proposition does not hold ina VARMA model. In other words, nonzero 
coefficients at the (i, j)th position of AR and MA matrices may exist even when 
rip does not depend on any lagged value of rj;. 
To illustrate, consider the following bivariate model 


| ®ıı(B) ®2(B) iil Fit ļ-{ ©1ı (B) ©12(B) ii air | 
21 (B) P22(B) ra ©21(B) ©O©n(B) ax | 


Here the necessary and sufficient conditions for the existence of a unidirectional 
dynamic relationship from rj; to rz; are 


@99(B)O12(B) — ®12(B)O22(B) = 0, 
but 
D1 1(B)©21(B) — 2|(B) O11 (B) F 0. (8.30) 


These conditions can be obtained as follows. Letting 
Q(B) = |®(B)| = O11 (B)P22(B) — O12(B) P21 (B) 


be the determinant of the AR matrix polynomial and premultiplying the model by 
the matrix 


| @0(B) Pe 
—®2 (B) =O (B) |’ 


we can rewrite the bivariate model as 
ao 
= Eo — ®2(B)O©2 (B) Pa(B)O©n(B)-— a] 
® 1) (B)O2 (B) — 21 (B)O©11 (B) ®11ı(B)O©2(B) — P21 (B)O12(B) 
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Figure 8.8 Time plots of log U.S. monthly interest rates from April 1953 to January 2001. Solid line 
denotes l-year Treasury constant maturity rate and dashed line denotes 3-year rate. 


Consider the equation for 7;;. The first condition in Eq. (8.30) shows that rı; does 
not depend on any past value of az; or ra. From the equation for rz, the second 
condition in Eq. (8.30) implies that r2, indeed depends on some past values of 
air. Based on Eq. (8.30), ©12(B) = ®12(B) = 0 is a sufficient, but not necessary, 
condition for the unidirectional relationship from rj; to raz. 

Estimation of a VARMA model can be carried out by either the conditional or 
exact maximum-likelihood method. The Q;(m) statistic continues to apply to the 
residual series of a fitted model, but the degrees of freedom of its asymptotic chi- 
squared distribution are k?m — g, where g is the number of estimated parameters 
in both the AR and MA coefficient matrices. 


Example 8.6. To demonstrate VARMA modeling, we consider two U.S. 
monthly interest rate series. The first series is the 1-year Treasury constant maturity 
rate, and the second series is the 3-year Treasury constant maturity rate. The data 
are obtained from the Federal Reserve Bank of St. Louis, and the sampling period 
is from April 1953 to January 2001. There are 574 observations. To ensure the 
positiveness of U.S. interest rates, we analyze the log series. Figure 8.8 shows the 
time plots of the two log interest rate series. The solid line denotes the 1-year 
maturity rate. The two series moved closely in the sampling period. 

The M (i) statistics and AIC criterion specify a VAR(4) model for the data. How- 
ever, we employ a VARMA(2,1) model because the two models provide similar 
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TABLE 8.7 Parameter Estimates of VARMA(2,1) Model for Two Monthly U.S. 
Interest Rate Series Based on Exact-Likelihood Method 


Parameter x x 103 
Estimate 1.82 —0.97| —0.84 0.98 f 0.90 —1.66 | 3.58 2.50 

— 0.99 — j — —0.47 | 2.50 2.19 
Standard error | 0.03 0.08 0.03 0.08 E 0.03 0.10 


— 0.01 — i 0.04 


fits. Table 8.7 shows the parameter estimates of the VARMA(2,1) model obtained 
by the exact-likelihood method. We removed the insignificant parameters and rees- 
timated the simplified model. The residual series of the fitted model has some minor 
serial and cross correlations at lags 7 and 11. Figure 8.9 shows the residual plots 
and indicates the existence of some outlying data points. The model can be further 
improved, but it seems to capture the dynamic structure of the data reasonably well. 

The final VARMA(2,1) model shows some interesting characteristics of the 
data. First, the interest rate series are highly contemporaneously correlated. The 
concurrent correlation coefficient is 2.5/./3.58 x 2.19 = 0.893. Second, there is a 
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Figure 8.9 Residual plots for log U.S. monthly interest rate series of Example 8.6. Fitted model is 
VARMA(2,1): (a) 1-year rate and (b) 3-year rate. 
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unidirectional linear relationship from the 3-year rate to the 1-year rate because the 
(2, 1)th elements of all AR and MA matrices are zero, but some (1, 2)th element 
is not zero. As a matter of fact, the model in Table 8.7 shows that 


3, = 0.025 + 0.9973 +1 + a3t + 0.47a3,t—1, 
Fit = 0.028 + 1.82r1 4-1 = 0.84r1 1—2 = 0.977r3.4—1 + 0.9873 +2 
+ ay — 0.90a1, 1-1 + 1.6643 4-1, 


where r;, is the log series of i-year interest rate and a;, is the corresponding shock 
series. Therefore, the 3-year interest rate does not depend on the past values of 
the 1-year rate, but the 1-year rate depends on the past values of the 3-year rate. 
Third, the two interest rate series appear to be unit-root nonstationary. Using the 
back-shift operator B, the model can be rewritten approximately as 


(1 — B)rs, = 0.03 + (1 + 0.47B)ax,, 
(1 — B)(1 — 0.82B)ry, = 0.03 — 0.97B(1 — B)r3, + (1 — 0.9B)ay,; + 1.66Ba3,;. 


Finally, the SCA commands used in the analysis are given in Appendix C. 


8.4.1 Marginal Models of Components 


Given a vector model for r;, the implied univariate models for the components rj; 
are the marginal models. For a k-dimensional ARMA(p, q) model, the marginal 
models are ARMA[kp, (k — 1)p + q]. This result can be obtained in two steps. 
First, the marginal model of a VMA(qg) model is univariate MA(q). Assume that 
r, is a VMA(q) process. Because the cross-correlation matrix of r, vanishes after 
lag q (i.e., p; = 0 for £ > q), the ACF of rj; is zero beyond lag q. Therefore, rj; is 
an MA process and its univariate model is in the form rj; = 0i o + X i Oi, ;Di,t—j. 
where {b;;} is a sequence of uncorrelated random variables with mean zero and 
variance Op The parameters 6;,; and oj, are functions of the parameters of the 
VMA model for r;. 

The second step to obtain the result is to diagonalize the AR matrix polynomial 
of a VARMA(p, q) model. For illustration, consider the bivariate AR(1) model 


1— 0B -B Tit _ ait 
-B 1-48 ra ax | 


Premultiplying the model by the matrix polynomial 


1—-ð2B 9B 
nB 1-0B |’ 


we obtain 


1-O5B -pB lfa 
1 — 11 Bi — 2B) = @ Os, B? |" | = z E "i, 
[C 11B)( 2B) 122 ey -oB 1=18| |a; 
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The left-hand side of the prior equation shows that the univariate AR polynomials 
for ri; are of order 2. In contrast, the right-hand side of the equation is in a VMA(1) 
form. Using the result of VMA models in step 1, we show that the univariate 
model for r;; is ARMA(2,1). The technique generalizes easily to the k-dimensional 
VAR(1) model, and the marginal models are ARMA(K, k — 1). More generally, for 
a k-dimensional VAR(p) model, the marginal models are ARMA[kKp, (k — 1) p]. 
The result for VARMA models follows directly from those of VMA and VAR 
models. 

The order [kp, (k — 1)p + q] is the maximum order (i.e., the upper bound) for 
the marginal models. The actual marginal order of rj; can be much lower. 


8.5  UNIT-ROOT NONSTATIONARITY AND COINTEGRATION 


When modeling several unit-root nonstationary time series jointly, one may 
encounter the case of cointegration. Consider the bivariate ARMA(1,1) model 


K = | 0.5 “a Pa = El Z | 0.2 | ler] (8 31) 
Xt —0.25 0.5 X2,t—1 dt —0.1 0.2 a2,t—1 , ` 

where the covariance matrix & of the shock a; is positive definite. This is not a 
weakly stationary model because the two eigenvalues of the AR coefficient matrix 
are 0 and 1. Figure 8.10 shows the time plots of a simulated series of the model with 
200 data points and & = I, whereas Figure 8.11 shows the sample autocorrelations 
of the two component series x;;. It is easy to see that the two series have high 


autocorrelations and exhibit features of unit-root nonstationarity. The two marginal 
models of x, are indeed unit-root nonstationary. Rewrite the model as 


1—0.5B B xır | | 1-0.2B 0.4B at 
0.258 1—0.5B xo | 0.1B 1—0.2B ay | 
Premultiplying the above equation by 
1—0.5B —B 
—0.25B 1=0.5B |’ 
we obtain the result 
1-—B 0 xy |_| 1-0.7B  —0.6B at 
0 1-—B xu | | —0.15B 1-—0.7B ay |` 


Therefore, each component x;; of the model is unit-root nonstationary and follows 
an ARIMA(0,1,1) model. 
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Figure 8.10 Time plots of simulated series based on model (8.31) with identity covariance matrix for 
shocks. 


However, we can consider a linear transformation by defining 


Vite] _ 1.0 —2.0 X| _ 
H ~ os 1.0 | a as 
by | _ [1.0 2.0] far] _ 
[e = E 1.0 | Fl = 


The VARMA model of the transformed series y, can be obtained as follows: 


Lx, = LOx,_, + La, — L@a,_ 
= LÖL! Lx, + La; — LOL™'La,_; 
= L@L~'(Lx;_1) +b, — LOL™'b,_1. 


Thus, the model for y, is 


Vit 10 0 Yig-1 bir 0.4 0 Dit-1 
— = — . (8.32 
| yor | | 0 0 | | Y2,1-1 | | bzr 0 0 b21-1 (8.32) 
From the prior model, we see that (a) yj; and yx are uncoupled series with con- 


current correlation equal to that between the shocks bı; and bx, (b) yi; follows a 
univariate ARIMA(0,1,1) model, and (c) ya; is a white noise series (i.e., y2 = bz). 
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Figure 8.11 Sample autocorrelation functions of two simulated component series. There are 200 
observations, and model is given by Eq. (8.31) with identity covariance matrix for shocks. 


In particular, the model in Eq. (8.32) shows that there is only a single unit root in 
the system. Consequently, the unit roots of xı, and x2, are introduced by the unit 
root of y;;. In the literature, yı; is referred to as the common trend of xı; and xz. 

The phenomenon that both xı; and x2; are unit-root nonstationary, but there is 
only a single unit root in the vector series, is referred to as cointegration in the 
econometric and time series literature. Another way to define cointegration is to 
focus on linear transformations of unit-root nonstationary series. For the simulated 
example of model (8.31), the transformation shows that the linear combination 
Yor = O.5X1; + xy does not have a unit root. Consequently, xı and x2, are coin- 
tegrated if (a) both of them are unit-root nonstationary, and (b) they have a linear 
combination that is unit-root stationary. 

Generally speaking, for a k-dimensional unit-root nonstationary time series, coin- 
tegration exists if there are less than k unit roots in the system. Let h be the number 
of unit roots in the k-dimensional series x,. Cointegration exists if 0 < h < k, and 
the quantity k — h is called the number of cointegrating factors. Alternatively, the 
number of cointegrating factors is the number of different linear combinations 
that are unit-root stationary. The linear combinations are called the cointegrating 
vectors. For the prior simulated example, y2, = (0.5, 1)x, so that (0.5, 1) is a 
cointegrating vector for the system. For more discussions on cointegration and 
cointegration tests, see Box and Tiao (1977), Engle and Granger (1987), Stock 
and Watson (1988), and Johansen (1988). We discuss cointegrated VAR models in 
Section 8.6. 

The concept of cointegration is interesting and has attracted a lot of attention in 
the literature. However, there are difficulties in testing for cointegration in a real 
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application. The main source of difficulties is that cointegration tests overlook the 
scaling effects of the component series. Interested readers are referred to Cochrane 
(1988) and Tiao, Tsay, and Wang (1993) for further discussion. 

While I have some misgivings on the practical value of cointegration tests, the 
idea of cointegration is highly relevant in financial study. For example, consider the 
stock of Finnish Nokia Corporation. Its price on the Helsinki Stock Market must 
move in unison with the price of its American Depositary Receipts on the New York 
Stock Exchange; otherwise there exists some arbitrage opportunity for investors. 
If the stock price has a unit root, then the two price series must be cointegrated. 
In practice, such a cointegration can exist after adjusting for transaction costs and 
exchange rate risk. We discuss issues like this later in Section 8.7. 


8.5.1 An Error Correction Form 


Because there are more unit-root nonstationary components than the number of 
unit roots in a cointegrated system, differencing individual components to achieve 
stationarity results in overdifferencing. Overdifferencing leads to the problem of 
unit roots in the MA matrix polynomial, which in turn may encounter difficulties in 
parameter estimation. If the MA matrix polynomial contains unit roots, the vector 
time series is said to be noninvertible. 

Engle and Granger (1987) discuss an error correction representation for a coin- 
tegrated system that overcomes the difficulty of estimating noninvertible VARMA 
models. Consider the cointegrated system in Eq. (8.31). Let Ax; = x; — x;_1 be 
the differenced series. Subtracting x;_; from both sides of the equation, we obtain 
a model for Ax; as 


Axir 

Axx 

— —0.5 —1.0 X1,t-1 ae dit = 0.2 —0.4 Q\.t-1 

~ —0.25 —0.5 X2,t-1 dt —0.1 0.2 d2 t—1 

= —1 X1,t—1 ait 0.2 —0.4 al t—1 

~ | —0.5 Jis, Laf X2 t-i }+| At | | —0.1 0.2 | | a2,t—1 |. 
This is a stationary model because both Ax; and [0.5, 1.0]x; = yx are unit-root 
stationary. Because x;—ı is used on the right-hand side of the previous equation, 
the MA matrix polynomial is the same as before and, hence, the model does not 
encounter the problem of noninvertibility. Such a formulation is referred to as an 
error correction model for Ax;, and it can be extended to the general cointegrated 


VARMA model. For a cointegrated VARMA(p, q) model with m cointegrating 
factors (m < k), an error correction representation is 


pol q 
Ax, =ap'x 1 +Y @FAx,_; +4, — Y Oja j, (8.33) 
i=l j=l 
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where œ and B are k x m full-rank matrices. The AR coefficient matrices ®* are 
functions of the original coefficient matrices ®;. Specifically, we have 


p 
=- 9%, j=l,....p—l, 

i=j+l 
wp’ = ,+ 6, 14+---+6,-I=-(1). (8.34) 


These results can be obtained by equating coefficient matrices of the AR matrix 
polynomials. The time series 6’x, is unit-root stationary, and the columns of £ are 
the cointegrating vectors of x;. 

Existence of the stationary series B’x,—_; in the error correction representation 
(8.33) is natural. It can be regarded as a “compensation” term for the overdif- 
ferenced system Ax,. The stationarity of B’x;_; can be justified as follows. The 
theory of unit-root time series shows that the sample correlation coefficient between 
a unit-root nonstationary series and a stationary series converges to zero as the sam- 
ple size goes to infinity; see Tsay and Tiao (1990) and the references therein. In an 
error correction representation, x;_; is unit-root nonstationary, but Ax; is station- 
ary. Therefore, the only way that Ax, can relate meaningfully to x;_; is through 
a stationary series B’x;_,. 


Remark. Our discussion of cointegration assumes that all unit roots are of 
multiplicity 1, but the concept can be extended to cases in which the unit roots have 
different multiplicities. Also, if the number of cointegrating factors m is given, then 
the error correction model in Eq. (8.33) can be estimated by likelihood methods. 
We discuss the simple case of cointegrated VAR models in the next section. Finally, 
there are many ways to construct an error correction representation. In fact, one 
can use any wB’x;_, for 1 < v < p in Eq. (8.33) with some modifications to the 
AR coefficient matrices ®*. 


8.6 COINTEGRATED VAR MODELS 


To better understand cointegration, we focus on VAR models for their simplicity 
in estimation. Consider a k-dimensional VAR(p) time series x, with possible time 
trend so that the model is 


X; = M; + Èx, +e + O,x;_p) + ay, (8.35) 


where the innovation a; is assumed to be Gaussian and u, = Mo + Mit, where Mo 
and u; are k-dimensional constant vectors. Write ®(B) = I — ®©;B —---— ®,B?. 
Recall that if all zeros of the determinant |®(B)| are outside the unit circle, then 
xX; is unit-root stationary. In the literature, a unit-root stationary series is said to 
be an /(0) process; that is, it is not integrated. If |®(1)| = 0, then x, is unit-root 
nonstationary. For simplicity, we assume that x, is at most an integrated process of 
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order 1; that is, an 7(1) process. This means that (1 — B)x;; is unit-root stationary 
if x;,; itself is not. 
An error correction model (ECM) for the VAR(p) process x; is 


Ax; = h; + Ix, + Axı ai a OF AX- pt + 4;, (8.36) 


where oF are defined in Eq. (8.34) and T = af’ = —®(1). We refer to the term 
Hx,—ı of Eq. (8.36) as the error correction term, which plays a key role in coin- 
tegration study. Notice that ®; can be recovered from the ECM representation via 


®,=/4+14+9%, 
©; = o* — ðt, i=2,...,D, 


where (i = 0, the zero matrix. Based on the assumption that x, is at most 7 (1), 
Ax, of Eq. (8.36) is an Z (0) process. 

If x, contains unit roots, then |®(1)| = 0 so that I = —®(1) is singular. There- 
fore, three cases are of interest in considering the ECM in Eq. (8.36): 


1. Rank(II) = 0. This implies II = 0 and x; is not cointegrated. The ECM of 
Eq. (8.36) reduces to 


Ax, = p, + P Axi +--+ + 7-1 AX pt + a, 


so that Ax, follows a VAR(p — 1) model with deterministic trend p;. 

2. Rank(II) = k. This implies that |®(1)| 4 0 and x, contains no unit roots; 
that is, x, is 1(0). The ECM model is not informative and one studies x, 
directly. 

3. 0 < Rank(IT) = m < k. In this case, one can write I as 


Il = of’, (8.37) 


where œ and B are k x m matrices with Rank(@) = Rank(B) = m. The ECM 
of Eq. (8.36) becomes 


Ax, = fh, + a B'x;_) + OF Ax;—| Speers ee 7 AX: p41 +a. (8.38) 


This means that x; is cointegrated with m linearly independent cointegrat- 
ing vectors, w; = B’x;, and has k — m unit roots that give k — m common 
stochastic trends of x+. 


If x, is cointegrated with Rank(II) = m, then a simple way to obtain a presen- 
tation of the k — m common trends is to obtain an orthogonal complement matrix 
a, of æ; that is, œ, is a k x (k —m) matrix such that wa = 0, a (k — m) xm 
zero matrix, and use y, = œ’, x;. To see this, one can premultiply the ECM by æ’, 
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and use II = of’ to see that there would be no error correction term in the result- 
ing equation. Consequently, the (k — m)-dimensional series y, should have k — m 
unit roots. For illustration, consider the bivariate example of Section 8.5.1. For this 
special series, œ = (—1, —0.5)’ and w, = (1, —2)’. Therefore, y, = (1, —2)x; = 
Xıt — 2x2;, which is precisely the unit-root nonstationary series yı; in Eq. (8.32). 

Note that the factorization in Eq. (8.37) is not unique because for any m x m 
orthogonal matrix @ satisfying QQ’ = I, we have a 


ap’ = a2’ B’ = wL) BL) = ap, 


where both a, and £, are also of rank m. Additional constraints are needed to 
uniquely identify æ and £. It is common to require that B’ = [I,, B1], where Im 
is the m x m identity matrix and B, is a (k — m) x m matrix. In practice, this may 
require reordering of the elements of x, such that the first m components all have 
a unit root. The elements of œ and B must also satisfy other constraints for the 
process w; = B’x, to be unit-root stationary. For example, consider the case of a 
bivariate VAR(1) model with one cointegrating vector. Here k = 2, m = 1, and the 
ECM is 


Q 
Ax; = h, + | n | [1, Bilx;-1 + ar. 


Premultiplying the prior equation by $’, using w;_; = B’'x;—;, and moving w;_ to 
the right-hand side of the equation, we obtain 


Wt = Bm; + (1 +a + a28))w;-1 + br, 


where b; = B'a;. This implies that w; is a stationary AR(1) process. Consequently, 
a; and 6; must satisfy the stationarity constraint |1 + a + a26)| < 1. 

The prior discussion shows that the rank of II in the ECM of Eq. (8.36) is the 
number of cointegrating vectors. Thus, to test for cointegration, one can examine 
the rank of II. This is the approach taken by Johansen (1988, 1995) and Reinsel 
and Ahn (1992). 


8.6.1 Specification of the Deterministic Function 


Similar to the univariate case, the limiting distributions of cointegration tests depend 
on the deterministic function j,. In this section, we discuss some specifications of 
ju, that have been proposed in the literature. To understand some of the statements 
made below, keep in mind that œ’ x, provides a presentation for the common 
stochastic trends of x, if it is cointegrated. 


1. m, = 0: In this case, all the component series of x; are /(1) without drift 
and the stationary series w, = Bx; has mean zero. 
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2. M, = Mo = &co, where co is an m-dimensional nonzero constant vector. The 
ECM becomes 


Ax; = a(B'x;-1 + co) + Ax; os ea OF AX pt + a;, 


so that the components of x, are 7(1) without drift, but w, have a nonzero 
mean —co. This is referred to as the case of restricted constant. 

3. M, = Mo, which is nonzero. Here the component series of x, are 7(1) with 
drift uo and w, may have a nonzero mean. 

4. H, = ko + &æcıt, where cı is a nonzero vector. The ECM becomes 


Ax; = Mo + a(x- + cıt) + Axi +--+ + OF AX pt + a, 


so that the components of x; are 7(1) with drift uo and w; has a linear time 
trend related to cıt. This is the case of restricted trend. 

5. M, = Ho + Ht, where m; are nonzero. Here both the constant and trend are 
unrestricted. The components of x, are /(1) and have a quadratic time trend 
and w; have a linear trend. 


Obviously, the last case is not common in empirical work. The first case is not 
common for economic time series but may represent the log price series of some 
assets. The third case is also useful in modeling asset prices. 


8.6.2 Maximum-Likelihood Estimation 


In this section, we briefly outline the maximum-likelihood estimation (MLE) of a 
cointegrated VAR(p) model. Suppose that the data are {x;|t = 1,..., T}. Without 
loss of generality, we write p, = wd;, where d, = [1, t]’, and it is understood that 
jt, depends on the specification of the previous section. For a given m, which is 
the rank of II, the ECM model becomes 


Ax; = wd; + ap’ x;—| + S Axi E E P5 1 AxX;-p+ + a;, (8.39) 


where t = p+1,..., T. A key step in the estimation is to concentrate the likeli- 
hood function with respect to the deterministic term and the stationary effects. This 
is done by considering the following two multivariate linear regressions: 


Ax; = Yodi = Qı AX1+-1 or a Qp-1 AX ;—p+1 + Uy, (8.40) 
Xp. = Yid; + By Axi- +--+ + Ep- AX py + Hy. (8.41) 


Let uw, and ò, be the residuals of Eqs. (8.40) and (8.41), respectively. Define the 
sample covariance matrices 
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Next, compute the eigenvalues and eigenvectors of S 10 Soo So, with respect to $11. 
This amounts to solving the eigenvalue problem 


|ASi1 — S1083 Soi] = 0. 


Denote the eigenvalue and eigenvector pairs by Ĝi, ei), where i 1> ds Se > îr. 


Here the eigenvectors are normalized so that e’S,,e = I, where e = [e,..., ex] 
is the matrix of eigenvectors. 
The unnormalized MLE of the cointegrating vector B is B = [e1, ..., €m] and 


from which we can obtain an MLE for that satisfies the identifying constraint 
and normalization condition. Denote the resulting estimate by B. with the subscript 
c signifying constraints. The MLE of other parameters can then be obtained by the 
multivariate linear regression 


at 
Ax, = ud, + oB .x;-1 +P Ax +--+ + 7 AX: pt + a;. 


The maximized value of the likelihood function based on m cointegrating vectors is 


m 
Lmax & [Sool [ [G - Â»). 


i=1 


This value is used in the maximum-likelihood ratio test for testing Rank(II) = m. 
Finally, estimates of the orthogonal complements of œ and £ can be obtained using 


A =] A 
1 = Sop Siilem+i, ---, €k], Bi = Sulem+, ---, €k]. 


8.6.3 Cointegration Test 


For a specified deterministic term f@,, we now discuss the maximum-likelihood test 
for testing the rank of the IT matrix in Eq. (8.36). Let H (m) be the null hypothesis 
that the rank of II is m. For example, under H(0), Rank(II) = 0 so that M = 0 
and there is no cointegration. The hypotheses of interest are 


H(0O) C::-C H(m) C-:: C H(k). 
For testing purpose, the ECM in Eq. (8.39) becomes 


Ax, = wd, + Ix, + Ëj Ax;1+---+ @ 


p-1 AX ;— p41 + 4t, 


where t = p+ 1,..., T. Our goal is to test the rank of I. Mathematically, the 
rank of II is the number of nonzero eigenvalues of I, which can be obtained if a 
consistent estimate of II is available. Based on the prior equation, which is in the 
form of a multivariate linear regression, we see that TI is related to the covariance 
matrix between x,_; and Ax, after adjusting for the effects of d, and Ax,_; for i 


COINTEGRATED VAR MODELS 437 


= 1,..., p — 1. The necessary adjustments can be achieved by the techniques of 
multivariate linear regression shown in the previous section. Indeed, the adjusted 
series of x;_; and Ax; are 0, and û;, respectively. The equation of interest for the 
cointegration test then becomes 


a; = Ili; + at. 


Under the normality assumption, the likelihood ratio test for testing the rank of 
TI in the prior equation can be done by using the canonical correlation analysis 
between a, and 0,. See Johnson and Wichern (1998) for information on canonical 
correlation analysis. The associated canonical correlations are the partial canonical 
correlations between Ax;—, and x;_; because the effects of d, and Ax;_; have 
been adjusted. The quantities {hi} are the squared canonical correlations between 
ai; and 0;. 
Consider the hypotheses 


Ho: Rank(II) =m versus HA, : Rank(II)>m. 


Johansen (1988) proposes the likelihood ratio (LR) statistic 


k 
LRy(m) = —(T — p) $` ma- Aj) (8.42) 
i=m+1 


to perform the test. If Rank(II) = m, then Îi should be small for i >m and hence 
LR,(™) should be small. This test is referred to as the trace cointegration test. Due 
the presence of unit roots, the asymptotic distribution of LRẹ (m) is not chi squared 
but a function of standard Brownian motions. Thus, critical values of LRy-(m) must 
be obtained via simulation. 

Johansen (1988) also considers a sequential procedure to determine the number 
of cointegrating vectors. Specifically, the hypotheses of interest are 


Ho : Rank(T) =m versus H, : Rank(T) =m -+ 1. 
The LR ratio test statistic, called the maximum eigenvalue statistic, is 
LRma(m) = —(T — p) In(1 — Am-41)- 
Again, critical values of the test statistics are nonstandard and must be evaluated 


via simulation. 


8.6.4 Forecasting of Cointegrated VAR Models 


The fitted ECM can be used to produce forecasts. First, conditioned on the estimated 
parameters, the ECM equation can be used to produce forecasts of the differenced 
series Ax;. Such forecasts can in turn be used to obtain forecasts of x. A difference 
between ECM forecasts and the traditional VAR forecasts is that the ECM approach 
imposes the cointegration relationships in producing the forecasts. 
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Figure 8.12 Time plots of weekly U.S. interest rate from December 12, 1958, to August 6, 2004. 
(a) The 3-month Treasury bill rate and (b) 6-month Treasury bill rate. Rates are from secondary market. 


8.6.5 An Example 


To demonstrate the analysis of cointegrated VAR models, we consider two weekly 
U.S. short-term interest rates. The series are the 3-month Treasury bill (TB) rate 
and 6-month Treasury bill rate from December 12, 1958, to August 6, 2004, for 
2383 observations. The TB rates are from the secondary market and obtained from 
the Federal Reserve Bank of St. Loius. Figure 8.12 shows the time plots of the 
interest rates. As expected, the two series move closely together. 

Our analysis uses the S-Plus software with commands VAR for VAR analy- 
sis, coint for cointegration test, and VECM for vector error correction estima- 
tion. Denote the two series by tb3m and tb6m and define the vector series x; = 
(tb3m,, tb6m,)’. The augmented Dickey—Fuller unit-root tests fail to reject the 
hypothesis of a unit root in the individual series; see Chapter 2. Indeed, the test 
statistics are —2.34 and —2.33 with p value about 0.16 for the 3-month and 6-month 
interest rate when an AR(3) model is used. Thus, we proceed to VAR modeling. 

For the bivariate series x,, the BIC criterion selects a VAR(3) model: 


> x=cbind(tb3m, tb6m) 
> y=data. frame (x) 

> ord.choiceSar.order 
[1] 3 
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To perform a cointegration test, we choose a restricted constant for m, because 
there is no reason a priori to believe the existence of a drift in the U.S. interest 
rate. Both Johansen’s tests confirm that the two series are cointegrated with one 
cointegrating vector when a VAR(3) model is entertained. 


> cointst.rc=coint (x,trend='re’, lags=2) % lags = p-1. 
> cointst.re 

Call: 

coint(Y = x, lags = 2, trend = "rc") 


Trend Specification: 
H1*(r): Restricted constant 


Trace tests sign. at the 5% level are flagged by ’ +’. 
Trace tests sign. at the 1% level are flagged by ‘++’. 
Max Eig. tests sign. at the 5% level are flagged by ’ *’. 
Max Eig. tests sign. at the 1% level are flagged by '**’. 


Tests for Cointegration Rank: 

Eigenvalue Trace Stat 95% CV 99% CV 
H(0)++** 0.0322 83:2712 19.96 24.60 
H(1) 0.0023 5.4936 9.24 12.97 


Max Stat 95% CV 99% CV 
H(0)++** 77.7776 15.67 20.20 
H(1) 5.4936 9.24 12297 


Next, we perform the maximum-likelihood estimation of the specified cointe- 
grated VAR(3) model using an ECM presentation. The results are as follows: 


> vecm. fit=VECM(cointst.rc) 
> summary (vecm. fit) 

Call: 

VECM(test = cointst.rc) 


Cointegrating Vectors: 
coint.1 
1.0000 


tbh6ém -1.0124 
(std.err) 0.0086 
(t.stat) -118.2799 


Intercept* 0.2254 
(std.err) 0.0545 
(t.stat) 4.1382 


VECM Coefficients: 
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tb3m tbom 

coint.1 -0.0949 -0.0211 
(std.err) 0.0199 0.0179 
(t.stat) -4.7590 -1.1775 


tb3m.lagl 0.0466 -0.0419 
(std.err) 0.0480 0.0432 
(t.stat) 0.9696 -0.9699 


tbom.lagl 0.2650 0.3164 
(std.err) 0.0538 0.0484 
(t.stat) 4.9263 6.5385 


tb3m.lag2 -0.2067 -0.0346 
(std.err) 0.0481 0.0433 
(t.stat) -4.2984 -0.8005 


tb6m.lag2 0.2547 0.0994 
(std.err) 0.0543 0.0488 
testat) 4.6936 2.0356 


Regression Diagnostics: 
tb3m tb6m 
R-squared 0.1081 0.0913 
Adj. R-squared 0.1066 0.0898 
Resid. Scale 0.2009 0.1807 


> plot (vecm. fit) 
Make a plot selection (or 0 to exit): 


1: plot: ALL 
2: plot: Response and Fitted Values 
3: plot: Residuals 


13: plot: PACF of Squared Cointegrating Residuals 
Selection: 


As expected, the output shows that the stationary series is w; ~ tb3m, — tb6m,; 
and the mean of w; is about —0.225. The fitted ECM is 


—0.09 0.05 0.27 
Ax; = | | (w1 + 0:23) + | A | AX;-1 


0.02 0.04 0.32 
0.21 0.25 
| —0.03 0.10 Jaxta 


and the estimated standard errors of a;, are 0.20 and 0.18, respectively. Ade- 
quacy of the fitted ECM can be examined via various plots. For illustration, 
Figure 8.13 shows the cointegrating residuals. Some large residuals are shown 


COINTEGRATED VAR MODELS 441 


ij T T T 
0 500 1000 1500 2000 


Figure 8.13 Time plot of cointegrating residuals for an ECM fit to weekly U.S. interest rate series. 
Data span is from December 12, 1958, to August 6, 2004. 


in the plot, which occurred in the early 1980s when the interest rates were high 
and volatile. 

Finally, we use the fitted ECM to produce 1-step- to 10-step-ahead forecasts for 
both Ax, and x,. The forecast origin is August 6, 2004. 


> vecm.fst=predict (vecm.fit, n.predict=10) 
> summary (vecm.fst) 
Predicted Values with Standard Errors: 


tb3m tb6ém 

1-step-ahead -0.0378 -0.0642 
(std.err) 0.2009 0.1807 
2-step-ahead -0.0870 -0.0864 
(std.err) 0.3222 0.29277 


10-step-ahead -0.2276 -0.1314 
(std.err) 0.8460 0.8157 
> plot (vecm.fst,xold=diff (x) ,n.old=12) 


> vecm.fit.level=VECM(cointst.rc,levels=T) 
> vecm.fst.level=predict(vecm.fit.level, n.predict=10) 
> summary (vecm.fst.level) 


Predicted Values with Standard Errors: 
tb3m tbém 
1-step-ahead 1.4501 1.7057 
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Figure 8.14 Forecasting plots of fitted ECM model for weekly U.S. interest rate series. Forecasts are 
for differenced series and forecast origin is August 6, 2004. 


(std.err) 0.2009 0.1807 
2-step-ahead 1.4420 1.7017 
(std.err) 0.3222 0.2927 


10-step-ahead 1.4722 1.7078 
(std.err) 0.8460 0.8157 
> plot(vecm.fst.level, xold=x, n.old=50) 


The forecasts are shown in Figures 8.14 and 8.15 for the differenced data and the 
original series, respectively, along with some observed data points. The dashed 
lines in the plots are pointwise 95% confidence intervals. Because of unit-root 
nonstationarity, the intervals are wide and not informative. 


Remark. The package urca of R can be used to perform Johansen’s co- 
integration test. The command is ca.jo. It requires specification of some sub- 
commands. See the section of pairs trading for demonstration. 


8.7 THRESHOLD COINTEGRATION AND ARBITRAGE 


In this section, we focus on detecting arbitrage opportunities in index trading by 
using multivariate time series methods. We also demonstrate that simple univariate 
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Figure 8.15 Forecasting plots of fitted ECM model for weekly U.S. interest rate series. Forecasts are 
for interest rates and forecast origin is August 6, 2004. 


nonlinear models of Chapter 4. can be extended naturally to the multivariate case 
in conjunction with the idea of cointegration. 

Our study considers the relationship between the price of the S&P 500 index 
futures and the price of the shares underlying the index on the cash market. Let 
fia be the log price of the index futures at time ¢ with maturity £, and let s, be 
the log price of the shares underlying the index on the cash market at time t. A 
version of the cost-of-carry model in the finance literature states 


Sie — St = re — Gel — t) +27, (8.43) 


where r;e is the risk-free interest rate, qr, is the dividend yield with respect to 
the cash price at time t, and (£ — t) is the time to maturity of the futures contract; 
see Brenner and Kroner (1995), Dwyer, Locke, and Yu (1996), and the references 
therein. 

The z* process of model (8.43) must be unit-root stationary; otherwise there 
exist persistent arbitrage opportunities. Here an arbitrage trading consists of simul- 
taneously buying (short-selling) the security index and selling (buying) the index 
futures whenever the log prices diverge by more than the cost of carrying the index 
over time until maturity of the futures contract. Under the weak stationarity of z¥, 
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for arbitrage to be profitable, z* must exceed a certain value in modulus determined 
by transaction costs and other economic and risk factors. 

It is commonly believed that the f; e and s, series of the S&P 500 index contain 
a unit root, but Eq. (8.43) indicates that they are cointegrated after adjusting for 
the effect of interest rate and dividend yield. The cointegrating vector is (1, —1) 
after the adjustment, and the cointegrated series is z*. Therefore, one should use 
an error correction form to model the return series r; = (Afr, As;)’, where Af; = 
Sie — fit-1, and As, = s; — St—1, where for ease in notation we drop the maturity 
time £ from the subscript of A/j;. 


8.7.1 Multivariate Threshold Model 


In practice, arbitrage tradings affect the dynamic of the market, and hence the 
model for r; may vary over time depending on the presence or absence of arbitrage 
tradings. Consequently, the prior discussions lead naturally to the following model: 


1 D. 
a+) p! rii + Biz-1 +a! ) ifza < Vis 


r= ot ye, Pr + Bz- tað ify < zi < y» (8.44) 
c3 + Ya DOr; ae B3Z1-1 + a® if yo < Zt-1, 


where z; = 100z*, yı < 0 < y2 are two real numbers, and {a\?} are sequences 
of two-dimensional white noises and are independent of each other. Here we use 
zt = 100z7 because the actual value of z7 is relatively small. 

The model in Eq. (8.44) is referred to as a multivariate threshold model with 
three regimes. The two real numbers yı and yọ are the thresholds and z;_; is 
the threshold variable. The threshold variable z;_; is supported by the data; see 
Tsay (1998). In general, one can select z;-g as a threshold variable by considering 
d € {1,..., do}, where do is a prespecified positive integer. 

Model (8.44) is a generalization of the threshold autoregressive model of 
Chapter 4. It is also a generalization of the error correlation model of Eq. 
(8.33). As mentioned earlier, an arbitrage trading is profitable only when z* or, 
equivalently, z; is large in modulus. Therefore, arbitrage tradings only occurred in 
regimes 1 and 3 of model (8.44). As such, the dynamic relationship between /;.¢ 
and s; in regime 2 is determined mainly by the normal market force, and hence 
the two series behave more or less like a random walk. In other words, the two 
log prices in the middle regime should be free from arbitrage effects and, hence, 
free from the cointegration constraint. From an econometric viewpoint, this means 
that the estimate of B, in the middle regime should be insignificant. 

In summary, we expect that the cointegration effects between the log price of 
the futures and the log price of security index on the cash market are significant 
in regimes 1 and 3, but insignificant in regime 2. This phenomenon is referred to 
as a threshold cointegration; see Balke and Fomby (1997). 
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Figure 8.16 Time plots of 1-minute log returns of S&P 500 index futures and cash prices and asso- 
ciated threshold variable in May 1993: (a) log returns of index futures, (b) log returns of index cash 
prices, and (c) z; series. 


8.7.2 The Data 


The data used in this case study are the intraday transaction data of the S&P 500 
index in May 1993 and its June futures contract traded at the Chicago Mercantile 
Exchange; see Forbes, Kalb, and Kofman (1999), who used the data to construct a 
minute-by-minute bivariate price series with 7060 observations. To avoid the undue 
influence of unusual returns, I replaced 10 extreme values (5 on each side) by the 
simple average of their two nearest neighbors. This step does not affect the qual- 
itative conclusion of the analysis but may affect the conditional heteroscedasticity 
in the data. For simplicity, we do not consider conditional heteroscedasticity in the 
study. Figure 8.16 shows the time plots of the log returns of the index futures and 
cash prices and the associated threshold variable z; = 100z7 of model (8.43). 


8.7.3 Estimation 


A formal specification of the multivariate threshold model in Eq. (8.44) includes 
selecting the threshold variable, determining the number of regimes, and choosing 
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the order p for each regime. Interested readers are referred to Tsay (1998) and 
Forbes, Kalb, and Kofman (1999). The thresholds yı and y2 can be estimated 
by using some information criteria [e.g., the Akaike information criterion (AIC) 
or the sum of squares of residuals]. Assuming p = 8, d € {1,2,3,4}, y1 € 
[—0.15, —0.02], and y2 € [0.025, 0.145], and using a grid search method with 
300 points on each of the two intervals, the AIC selects z,;_; as the threshold 
variable with thresholds ~; = —0.0226 and 7 = 0.0377. Details of the parameter 
estimates are given in Table 8.8. n 
From Table 8.8, we make the following observations. First, the ¢ ratios of 6, in 
the middle regime show that, as expected, the estimates are insignificant at the 5% 
level, confirming that there is no cointegration between the two log prices in the 
absence of arbitrage opportunities. Second, Af, depends negatively on Af;—1 in all 
three regimes. This is in agreement with the bid—ask bounce discussed in Chapter 5. 
Third, past log returns of the index futures seem to be more informative than the 
past log returns of the cash prices because there are more significant t ratios in 
Af;—i than in As;_;. This is reasonable because futures series are in general more 
liquid. For more information on index arbitrage, see Dwyer, Locke, and Yu (1996). 


8.8 PAIRS TRADING 


Pairs trading is a market-neutral trading strategy. There are several versions of pairs 
trading in the equity markets. In this section, we focus on the statistical arbitrage 
pairs trading, which makes use of the ideas of cointegration and error correction 
model discussed in the chapter. Our discussion will be brief. For more information 
concerning pairs trading and statistical arbitrage, see Vidyamurthy (2004) and Pole 
(2007). 

The general theme for trading in the equity markets is to buy undervalued stocks 
and sell overvalued ones. However, the true price of a stock is hard to assess. 
Pairs trading attempts to resolve this difficulty using the idea of relative pricing. 
Based on the arbitrage pricing theory (APT) in finance, if two stocks have similar 
characteristics, then the prices of both stocks must be more or less the same. If 
the prices differ, then it is likely that one of the stocks is overpriced and the other 
underpriced. Pairs trading involves selling the higher priced stock and buying the 
lower priced stock with the hope that the mispricing will correct itself in the future. 
Note that the true prices of the two stocks are not important. The observed prices 
may be wrong. What is important is that the observed prices be the same. The 
gap (properly scaled) between the two observed prices is called the spread. For 
pairs trading, the greater the spread, the larger the magnitude of mispricing and the 
greater the profit potential. Before discussing a trading strategy, we first introduce 
the theoretical framework. 


8.8.1 Theoretical Framework 


Consider two stocks. Let P;, be the observed price of stock i at time t and pi; = 
In(Pi;) be the corresponding log price. As mentioned in earlier chapters, it is 
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TABLE 8.8 Least-Squares Estimates and Their ¢ Ratios of Multivariate Threshold 
Model in Eq. (8.43) for S&P 500 Index Data in May 1993° 


Regime 1 

Af; As; 
bo 0.00002 0.00005 
t (1.47) (7.64) 
Afi-1 —0.08468 0.07098 
t (—3.83) (6.15) 
Afi-2 —0.00450 0.15899 
t (—0.20) (13.36) 
Afi-3 0.02274 0.11911 
t (0.95) (9.53) 
Afi—4 0.02429 0.08141 
t (0.99) (6.35) 
Afi—s 0.00340 0.08936 
t (0.14) (7.10) 
Afi-6 0.00098 0.07291 
t (0.04) (5.64) 
Afi-7 —0.00372 0.05201 
t (—0.15) (4.01) 
Afi-s 0.00043 0.00954 
t (0.02) (0.76) 
As;-| —0.08419 0.00264 
t (—2.01) (0.12) 
As;—2 —0.05103 0.00256 
t (—1.18) (0.11) 
As}-3 0.07275 —0.03631 
t (1.65) (—1.58) 
As}—4 0.04706 0.01438 
t (1.03) (0.60) 
As,—5 0.08118 0.02111 
t (1.77) (0.88) 
As;-6 0.04390 0.04569 
t (0.96) (1.92) 
As;}—7 —0.03033 0.02051 
t (—0.70) (0.91) 
As;_8 —0.02920 0.03018 
t (—0.68) (1.34) 
Zt-1 0.00024 0.00097 
t (1.34) (10.47) 


“The numbers of data points for the three regimes are 2234, 2410, and 2408, respectively. 


Regime 2 
Afi Ası 
0.00000 0.00000 
(—0.07) (0.53) 
—0.03861 0.04037 
(—1.53) (3.98) 
0.04478 0.08621 
(1.85) (8.88) 
0.07251 0.09752 
(3.08) (10.32) 
0.01418 0.06827 
(0.60) (7.24) 
0.01185 0.04831 
(0.51) (5:13) 
0.01251 0.03580 
(0.54) (3.84) 
0.02989 0.04837 
(1.34) (5.42) 
0.01812 0.02196 
(0.85) (2.57) 
—0.07618 —0.05633 
(—1.70) (—3.14) 
—0.10920 —0.01521 
(—2.59) (—0.90) 
—0.00504 0.01174 
(—0.12) (0.71) 
0.02751 0.01490 
(0.71) (0.96) 
0.03943 0.02330 
(0.97) (1.43) 
0.01690 0.01919 
(0.44) (1.25) 
—0.08647 0.00270 
(—2.09) (0.16) 
0.01887 —0.00213 
(0.49) (—0.14) 
—0.00010 0.00012 
(—0.30) (0.86) 


Regime 3 
Afi As, 
—0.00001 —0.00005 
(—0.74) (—6.37) 
—0.04102 0.02305 
(—1.72) (1.96) 
—0.02069 0.09898 
(—0.87) (8.45) 
0.00365 0.08455 
(0.15) (7.02) 
—0.02759 0.07699 
(—1.13) (6.37) 
—0.00638 0.05004 
(—0.26) (4.07) 
—0.03941 0.02615 
(—1.62) (2.18) 
—0.02031 0.02293 
(—0.85) (1.95) 
—0.04422 0.00462 
(—1.90) (0.40) 
0.06664 0.11143 
(1.49) (5.05) 
0.04099 —0.01179 
(0.92) (—0.53) 
—0.01948 —0.01829 
(—0.44) (—0.84) 
0.01646 0.00367 
(0.37) (0.17) 
—0.03430 —0.00462 
(—0.83) (—0.23) 
0.06084 —0.00392 
(1.45) (—0.19) 
—0.00491 0.03597 
(—0.13) (1.90) 
0.00030 0.02171 
(0.01) (1.14) 
0.00025 0.00086 
(1.41) (9.75) 
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reasonable to assume that pj; is unit-root nonstationary and follows a random-walk 
model; that is, pit = Pis—1 + rit, where {rj;;} is the return and forms a sequence 
of uncorrelated innovations. If the two stocks have similar risk factors, then they 
should have similar returns based on APT. Therefore, pj; and pz; are likely to be 
driven by a common component and are cointegrated. In other words, there exists a 
linear combination w; = pir — Y Par, which is unit-root stationary and, hence, mean 
reverting. The two price series {p1;} and {p2;} thus assume an error correction form 


Pit — Pi,t-1 Oy Elt 
| P% — Pag-i | ~ | d2 | Pa | Ex | ; Se 
where Uw = E(w;) denotes the mean of w. The four parameters y, pw, a1, and 
a2 can be estimated, for instance, by the maximum-likelihood or least-squares 
methods; see Section 8.6.2. We refer to the stationary series w; as the spread 
between the two log stock prices. 

The left-hand side of Eq. (8.45) consists the log returns of the two stocks. The 
equation says that the returns depend on w;_1, which is the stationary. Specifically, 
Wr—1 — Hw denotes the deviation from the log-run equilibrium between the two 
stocks. Equation (8.45) shows that, for cointegrated stocks, the returns depend on 
the past deviation from equilibrium. The coefficients a; and œ show the effect of 
past deviation on the returns rı; and rx, respectively. In practice, a; and œ? should 
have opposite signs, indicating reversion to the equilibrium. 

Next, consider a portfolio with long one share of stock 1 and short y shares of 
stock 2. The return of the portfolio for a given time period i is 


Tp atti = (Pirti — Pir) — Y (P2t+i — Pu) 


= (Pi t+i — VP2,14i) — (Pit — YP) 


= Wr+i — Wt. 


Therefore, the return rp, +; of the portfolio is the increment of the spread in the 
time period. As expected, the return of the portfolio does not depend on the mean 
of Wr. 


8.8.2 Trading Strategy 


The idea behind a pairs-trading strategy is to trade on the oscillations about the 
equilibrium value of the spread. The oscillations in spread occur because the spread 
is mean reverting. Since the equilibrium value is the mean of w+, that is, Hw, we 
can put on a trade when w; deviates substantially from its mean and unwind the 
trade when the equilibrium is restored. In practice, how big the deviation needs 
to be in order for the trading to be profitable depends on several factors. Trading 
costs, marginal interest rates, and bid—ask spreads of the two stocks are three 
obvious factors. Mathematically, let 7 be the cost involved in carrying out a pairs 
trading. Let A be a target deviation of w; from its mean Hw for pairs trading. Then, 
conditioned on 2A > n, a simple trading strategy is as follows: 
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e Buy a share of stock 1 and short y shares of stock 2 at time t if w; = 
Pit — YPu = Mw — A. 

Unwind the position at time t + i (i >0) if wi = Pi iți — YP2st+i = Uw + 
A. 


One can identify the time point f so long as A is not too large compared with 
the standard deviation of w,. The time point t + i will occur because of the mean 
reverting of the spread series. In this particular instance, the return of the portfolio 
Wri — W, = 2A and the net profit of the trade is 2A — n > 0. 


Discussion. The aforementioned trading strategy is just one of many possibil- 
ities. For instance, if A> 7, one can unwind the position when W+; = Hw. The 
net profit of the pairs trading then is A — 7 > 0. This may result in more trans- 
actions and trading costs, but it shortens the holding period of the portfolio. If A 
is negative, then one can short one share of stock | and buy y shares of stock 2 
to make a net profit —2A — n. The quantity 7 is the threshold for trading and is 
likely to depend on several factors such as transaction fees and bid—ask spreads of 
the two stocks. 


8.8.3 Simple Illustration 


To demonstrate pairs trading, we consider two stocks traded on the New York 
Stock Exchange. The two companies are the Billiton Ltd. of Australia and the 
Vale S.A. of Brazil with stock symbols BHP and VALE, respectively. BHP of 
Australia is a natural resources company with business in Australia, the Americans, 
and Southern Africa. Vale of Brazil is a worldwide metals and mining company. 
Thus, both multinational companies belong to the natural resources industry and 
encounter similar risk factors. The daily prices of the two stocks were downloaded 
from Yahoo Finance, and we employ adjusted closing prices from July 1, 2002, to 
March 31, 2006, in our study. 

Figure 8.17 shows the time plots of the daily log prices of the two stocks 
(adjusted closing prices). The upper plot is for the BHP stock. From the plots, the 
prices of the two stocks exhibit certain characteristics of comovement. Let pı; and 
Pu be the daily log closing prices of BHP and VALE, respectively. We analyze 
the series using both the least-squares and maximum-likelihood methods. 


Least-Squares Estimation 

A simple way to verify that the two stocks are suitable for pairs trading is to check 
the cointegration of their log stock prices. To this end, we consider the simple 
linear regression pj; = Bo + 61 p2r + wr, where w; denotes the residual series. For 
the BHP and VALE stocks, we have 


Pir = 1.823+0.717py + Ôr, Sw = 0.044. 
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Figure 8.17 Daily log (adjusted) closing prices of BHP and VALE stocks from July 1, 2002, to March 
31, 2006. Upper plot is for BHP stock. 


Figure 8.18(a) shows the time plot of the residual series w,. The plot shows that the 
residual series has certain characteristics of a stationary time series. In particular, it 
has mean zero and fluctuates around its mean within a fixed range. Figure 8.18(b) 
gives the sample ACF of w,. The ACFs decay exponentially, supporting that w, 
is indeed stationary. To further confirm the stationarity assertion, we fit an AR(2) 
model to w, and obtain 


(1 — 0.805B — 0.122B5 Ô, = a, oa = 0.018. 


Following the discussion of Chapter 2, we can obtain the two characteristic 
roots of the fitted AR(2) model. Indeed, the model can be rewritten as 
(1 — 0.935B)(1 — 0.130B)w, = a;. Hence, Ù, is stationary. Finally, we conduct 
an augmented Dickey—Fuller unit-root test on Ù, using an AR(2) model and find 
that the test statistic is —6.04 with a p value of 0.01. The unit-root hypothesis is 
clearly rejected. 


Maximum-Likelihood Estimation 

A formal approach to verify the cointegration of the two log stock prices is to per- 
form a cointegration test. Let x; = (pır, pu). Using information criteria, a VAR(1) 
model is specified for x,. We then conduct cointegration tests with restricted and 
unrestricted constant. Both tests give similar results so that we only report the 
results for the case of restricted constant. 
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Figure 8.18 Results of least-squares estimation: (a) Time plot of the estimated spread between BHP 
and VALE daily log stock prices. (b) Sample autocorrelation functions of estimated spread. 


> coint2=coint (xx,trend="re") 
> coint2 
coint(Y = xt, trend = "rce") 


Trend Specification: 
H1*(r): Restricted constant 


Trace tests signif. at the 5% level are flagged by ’ +’. 
Trace tests signif. at the 1% level are flagged by ‘'++’. 
Max Eigenvalue tests signif. at the 5% level are 
flagged by ' *’. 

Max Eigenvalue tests signif. at the 1% level are 
flagged by ‘’**’. 


Tests for Cointegration Rank: 
Eigenvalue TraceSt 95%-CV 99%-CV Max-St 95%-CV 99%- 


CV 
H(O)++** 0.0415 47.7400 19.960 24.600 39.965 15.670 20.200 
H(1) 0.0082 7.7748 9.240 12.970 7.774 9.240 12.970 


The test confirms that x; is cointegrated. Next, we perform the maximum- 
likelihood estimation of the error correction model. The results are given below: 


> n3=VECM(coint2) 
> summary (n3) 
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VECM (test = coint2) 


Cointegrating Vectors 


coint.1 

1.0000 

vale -0.7177 
(std.err) 0.0112 


~ 


testat) -64.0913 


Intercept* -1.8144 
(std.err) 0.0169 
t.stat) -107.0430 


~ 


VECM Coefficients: 
bhp 
coint.1 -0.0671 0. 
(std.err) 0.0145 0. 
(t.stat) -4.6462 1. 


o 


hp.lag1 -0.1119 0. 
td.err) 0.0366 0. 
(t.stat) -3.0596 1. 


n 


vale.lagl 0.0732 0. 
(std.err) - 0320 
t.stat) 2.2920 1. 


io) 
© 


Regression Diagnostic 
bhp 

R-squared 0.0370 

Adj. R-squared 0.0350 
Resid. Scale 0.0193 


Based on the estimation res 


vale 
0263 
0168 
5659 


0659 
0425 
5516 


0445 


0371 


1986 


s: 
vale 
0.0104 
0.0083 
0.0224 


ult, we have the model 


—0.067 —0.11 0.07 
Ax; = | | (w1 — 1.81) + | | Ax;-1 + 41, 


0.026 


where the estimated standa: 
addition, the spread series i 
1.81. Clearly, the result is 
particular, the y parameter 


0.07 0.04 


rd errors of aj; are 0.019 and 0.022, respectively. In 
s Wr = Pir — 0.718 px, which is stationary with mean 
very close to that of the least-squares estimation. In 
for the pairs trading is y = 0.718. Also, as expected, 


a is negative whereas œz is positive. 


Trading Strategy 


Since the standard error of the spread series w; is 0.044, we can select A = 0.045, 
which is slightly greater than one standard error of w;, for pairs trading. This choice 
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Figure 8.19 Time plot of fitted spread series between daily log prices of BHP and VALE stocks. 
Three horizontal lines denotes Hw, Hw + 0.045, and pu — 0.045 with pu = E(w). 


of A ensures that the probability for the spread w, to deviate A away from its mean 
is not small. In fact, under the normality assumption, the probability is about 30%. 
Figure 8.19 shows the time plot of the spread series w, of the fitted error correction 
model. Three horizontal lines are imposed on the plot. They are Mw, Uw + 0.045, 
and Uw — 0.045 with the latter two serving as boundaries for pairs trading. Since 
w, varies from the lower boundary to the upper one (or from the upper boundary 
to the lower one) several times, there are many pairs-trading opportunities. From 
the discussion of Section 8.8.2, the log return of each pairs trading is 2A = 0.09, 
which is not small. A more realistic demonstration is to implement the trading in 
a out-of-sample period. However, the example shows that pairs trading is feasible. 

Finally, an important question in pairs trading is to identify the cointegrated 
pairs of stocks. There are some procedures available in the literature. It seems 
reasonable to consider pairs of stocks that have similar risk factors. In other words, 
one should make use of finance theory to guide the selection. 


R Demonstration 
The following output has been edited: 


library (urca) 

help(ca.jo) 
da=read.table("d-bhp0206.txt",header=T) 
dal=read.table("d-vale0206.txt",header=T) 
bhp=log(da[,9]) 


VVVVYV 
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> vale=log(dal[,9]) 

> ml=lm(bhp~vale) 

> summary (m1) 

Call: 

lm(formula = bhp ~ vale) 


Coefficients: 

Estimate Std. Error t value Pr(>|t|) 
(Intercept) 1.822648 0.003662 497.7 >2e-16 *** 
vale 0.716664 0.002354 304.4 >2e-16 *** 


Residual standard error: 0.04421 on 944 degrees of freedom 
Multiple R-squared: 0.9899, Adjusted R-squared: 0.9899 
F-statistic: 9.266e+04 on 1 and 944 DF, p-value: < 2.2e-16 


> wt=mlSresiduals 
> m3=arima(wt,order=c(2,0,0),include.mean=F) 


> m3 

Call: 

arima(x = wt, order = c(2, 0, 0), include.mean = F) 
Coefficients: 


arl ar2 
0.8051 0.1219 
s.e. 0.0322 0.0325 


sigma*2 estimated as 0.0003326: log likelihood=2444.76 
> pl=c(1,-m3S$coef) 
> x=polyroot (p1) 


> x 
[1] 1.069100+0i -7.675365-01 
> 1/Mod (x) 


[1] 0.9353661 0.1302870 


> xt=cbind (bhp, vale) 

> mm=ar (xt) 

> mmSorder 

[1] 2 

> cot=ca.jo(xt,ecdet="const",type='’trace’ ,K=2, 

spec=’transitory’ ) 

> summary (cot) 

FE AE FE FE FE FE FE FE FE FE FE FE FE FE FE FE HE HE HE H HH 

# Johansen-Procedure # 

FE AE HE FE FE FE FE FE FE FE FE FE FE FE FE FE HE HE HE H HH 

Test type: trace statistic, without linear trend and 
constant in cointegration 


Eigenvalues (lambda): 


PAIRS TRADING 455 
[1] 4.148282e-02 8.206470e-03 -4.610389e-18 
Values of teststatistic and critical values of test: 

test 10pct 5pct Il1pct 


<= 1 | 7.78 7.52 9.24 12.97 


E 
r=0 | 47.77 17.85 19.96 24.60 


Eigenvectors, normalised to first column: 
(These are the cointegration relations) 


bhp.11 vale.11 constant 
bhp.11 1.000000 1.0000000 1.000000 
vale.11 -0.717704 -0.7327542 2.047274 
constant -1.828460 -1.5411890 -5.712629 


Weights W: 
(This is the loading matrix) 


bhp.11 vale.11 constant 
bhp.d -0.06731196 0.004568985 9.341093e-18 
vale.d 0.02545606 0.007541565 1.015639e-18 


> col=ca.jo(xt,ecdet="const",type='eigen' ,K=2, 
spec=’transitory’ ) 

> summary(col) 

HE E FE FE FE FE FE FE FE TE HE FE HE E HE HE H H H H HEE 

# Johansen-Procedure # 

E E FE FE FE FE FE TE FE FE FE FE FE HE HE H H H H H H H 

Test type: maximal eigenvalue statistic (lambda max), without 
linear trend and constant in cointegration 


Eigenvalues (lambda): 
[1] 4.148282e-02 8.206470e-03 -4.610389e-18 


Values of teststatistic and critical values of test: 
test 10pct 5pct tIl1pct 


Ess | 7.78 7.52 9.24 12.97 
r= 0 | 40.00 13.75 15.67 20.20 


Eigenvectors, normalised to first column: 
(These are the cointegration relations) 


bhp.11 vale.11 constant 
bhp.11 1.000000 1.0000000 1.000000 
vale.11 -0.717704 -0.7327542 2.047274 
constant -1.828460 -1.5411890 -5.712629 
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Weights W: 
(This is the loading matrix) 


bhp.11 vale.11 constant 
bhp.d -0.06731196 0.004568985 9.341093e-18 
vale.d 0.02545606 0.007541565 1.015639e-18 


APPENDIX A: REVIEW OF VECTORS AND MATRICES 


In this appendix, we briefly review some algebra and properties of vectors and 
matrices. No proofs are given as they can be found in standard textbooks on 
matrices (e.g., Graybill, 1969). 

An m x n real-valued matrix is an m x n array of real numbers. For example, 


2 5 8 
a=|_j 3 J 


is a 2 x 3 matrix. This matrix has two rows and three columns. In general, an 
m x n matrix is written as 


ail a12 aiei ai ,n—1 din 
a21 an2 EES a2,n—1 an 

A = [aj] = : ‘ . . i (8.46) 
am1 Am2 t+) Am,n=1 Amn 


The positive integers m and n are the row dimension and column dimension of A. 
The real number a;; is referred to as the (i, j)th element of A. In particular, the 
elements a;; are the diagonal elements of the matrix. 

An m x | matrix forms an m-dimensional column vector, and a 1 x n matrix 
is an n-dimensional row vector. In the literature, a vector is often meant to be a 
column vector. If m = n, then the matrix is a square matrix. If aj; = 0 for i A j 
and m = n, then the matrix A is a diagonal matrix. If a;; = 0 fori # j and aj; = 1 
for all 7, then A is the m x m identity matrix, which is commonly denoted by Z, 
or simply J if the dimension is clear. 

The n x m matrix 


a1 a21 > GAm-1,1 Ami 
A' 412 A22 +++ Am-—1,2 Am2 
Alin An >t Am-1,n Amn 


is the transpose of the matrix A. For example, 


2 -1 
5 3 is the transpose of Ta ; 
8 4 -1 3 4 
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We use the notation A’ = [a;;] to denote the transpose of A. From the definition, 


a;; = aji and (A'Y = A. If A’ = A, then A is a symmetric matrix. 


Basic Operations 


Suppose that A = [djj]mxn and C = [cij]pxq are two matrices with dimensions 
given in the subscript. Let b be a real number. Some basic matrix operations are 
defined next: 


e Addition: A + C = [aij + Cij|mxn if m = p andn=q. 

e Subtraction: A — C = [aij — Cij|mxn if m = p and n = 4q. 

e Scalar multiplication: bA = [baij ]mxn- 

e Multiplication: AC = DS GivCvj|lmxq provided that n = p. 


When the dimensions of matrices satisfy the condition for multiplication to 
take place, the two matrices are said to be conformable. An example of matrix 
multiplication is 


2 1 1 2 3 |_| 2-1-1-1 2-2+1-2 2-3-1-4 
1 1 -1 2 —4 | | 1-1-1-1 1-241-2 1-3-1-4 


|16 2 

~10 4 -1| 
Important rules of matrix operations include (a) (AC) = C'A’ and (b) AC # CA 
in general. 


Inverse, Trace, Eigenvalue, and Eigenvector 


A square matrix Aj x) is nonsingular or invertible if there exists a unique matrix 
Cmxm such that AC = CA = Im, the m x m identity matrix. In this case, C is 
called the inverse matrix of A and is denoted by C = A7!. 

The trace of Amxm is the sum of its diagonal elements [i.e., tr(A) = $`; aiil. 
It is easy to see that (a) tr(A + C) = tr(A) + tr(C), (b) tr(A) = tr(A’), and (c) 
tr(AC) = tr(CA) provided that the two matrices are conformable. 

A number à and an m x 1 vector b, possibly complex valued, are a right eigen- 
value and eigenvector pair of the matrix A if Ab = àb. There are m possible 
eigenvalues for the matrix A. For a real-valued matrix A, complex eigenvalues 
occur in conjugated pairs. The matrix A is nonsingular if and only if all of its 
eigenvalues are nonzero. Denote the eigenvalues by {A;|i = 1,..., m}: We have 
tr(A) = } ;_; A;. In addition, the determinant of the matrix A can be defined as 
|A| = [];_, 4;. For a general definition of determinant of a matrix, see a standard 
textbook on matrices (e.g., Graybill, 1969). 

Finally, the rank of the matrix Amxn is the number of nonzero eigenvalues of 
the symmetric matrix AA’. Also, for a nonsingular matrix A, (ATIY = (A)! 
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Positive-Definite Matrix 


A square matrix A (m x m) is a positive-definite matrix if (a) A is symmetric and 
(b) all eigenvalues of A are positive. Alternatively, A is a positive-definite matrix 
if for any nonzero m-dimensional vector b, we have b’ Ab > 0. 

Useful properties of a positive-definite matrix A include (a) all eigenvalues of 
A are real and positive, and (b) the matrix can be decomposed as 


A= PAP’, 


where A is a diagonal matrix consisting of all eigenvalues of A and P is anm x m 
matrix consisting of the m right eigenvectors of A. It is common to write the 
eigenvalues as A; > Az >--: >A, and the eigenvectors as €1,..., €m such that 
Ae; = A,;e; and elei = |. In addition, these eigenvectors are orthogonal to each 
other—namely, ee; = 0 if i 4 j—if the eigenvalues are distinct. The matrix 
P is an orthogonal matrix and the decomposition is referred to as the spectral 
decomposition of the matrix A. Consider, for example, the simple 2 x 2 matrix 


which is positive definite. Simple calculations show that 


2 1 1] 3 1 2 1 1] 1 
1 2 1] 1 |? 1 2 -1 } | -1 |" 
Therefore, 3 and 1 are eigenvalues of X with normalized eigenvectors 


(1/V2, 1/2)’ and (1/72, —1//2)’, respectively. It is easy to verify that the 
spectral decomposition holds—that is, 


1 1 1 1 
Z Z 2 1 Z Va |_}|3 0 
E We i eee ee eh a? 
v2 V2 v2 V2 
For a symmetric matrix A, there exists a lower triangular matrix L with diagonal 
elements being 1 and a diagonal matrix G such that A = LGL’; see Chapter 1 


of Strang (1980). If A is positive definite, then the diagonal elements of G are 
positive. In this case, we have 


A=iLJ/GV6L' = 2/66), 


where LV/G is again a lower triangular matrix and the square root is taken element 
by element. Such a decomposition is called the Cholesky decomposition of A. This 
decomposition shows that a positive-definite matrix A can be diagonalized as 


L'A(L')"! = L7'A(L7!) =G. 
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Since L is a lower triangular matrix with unit diagonal elements, L~! is also lower 
triangular matrix with unit diagonal elements. Consider again the prior 2 x 2 matrix 
x. It is easy to verify that 


1.0 0.0 2.0 0.0 
L=| 53 Ha an e=| $5 a 


satisfy £ = LGL’. In addition, 


L! = | a | and  L'E(L Y =G. 


Vectorization and Kronecker Product 


Writing an m x n matrix A in its columns as A = [a;,..., a], we define the 
stacking operation as vec(A) = (a), 4%, ..., a), which is an mn x 1 vector. For 
two matrices Amxn and C pxq, the Kronecker product between A and C is 


aC aC +--+ ane 

anC avnC +--+ aynC 
A®C= . 

amı C am2C siei amn C 


mpxnq 


For example, assume that 


2 1 4 -1 3 
TEEN c=| 5 ol 


Then vec(A) = (2, —1, 1, 3)’, vec(C) = (4, —2, —1, 5, 3, 2)’, and 


8 -2 6 4 -I 3 
—4 10 4 -2 5 2 
—4 1 =3: 12 =3.9 

2 =5 =2 =6 15 6 


Assuming that the dimensions are appropriate, we have the following useful prop- 
erties for the two operators: 


A&C #C&A in general. 

(A8CY=A' QC. 

. AQ®(C+D)=A8C+AQD. 

. (A8 C)(F 8 G) = (AF) 8 (CG). 

. If A and C are invertible, then (A & C)! = A7! & C™!. 
. For square matrices A and C, tr(A ® C) = tr(A)tr(C). 
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7. vec(A + C) = vec(A) + vec(C). 

8. vec(ABC) = (C’ @ A) vec(B). 

9. tr(AC) = vec(C’)/vec(A) = vec(A’)'vec(C). 
10. tr(ABC) = vec(A’)'(C’ @ I)vec(B) = vec(A’)'(I & B)vec(C) 


= vec( B’) (A' & T)vec(C) = vec(B')’(I ® C)vec(A) 
= vec(C’)'(B' & T)vec(A) = vec(C'' (I & A)vec(B). 


In multivariate statistical analysis, we often deal with symmetric matrices. It 
is therefore convenient to generalize the stacking operation to the half-stacking 
operation, which consists of elements on or below the main diagonal. Specifically, 
for a symmetric square matrix A = [a;;]xxx, define 

1 F 1 $ 
vech(A) = (a) , azg, <- -> Agy) > 
where aj. is the first column of A, and ix = (Aii, Ai+1,i, ---, aki) is a (k — i + 1)- 
dimensional vector. The dimension of vech(A) is k(k + 1)/2. For example, suppose 
that k = 3. Then we have vech(A) = (a11, 421, 431, 422, 432, 433)’, Which is a six- 
dimensional vector. 


APPENDIX B: MULTIVARIATE NORMAL DISTRIBUTIONS 


A k-dimensional random vector x = (x1, ..., xX) follows a multivariate normal 
distribution with mean 4 = (41, ..., 4g)’ and positive-definite covariance matrix 
È = [o;j] if its probability density function (pdf) is 


_ 1 1 rel 
f(x|m, X) = Onyx 172 °*P -30 — pyr (x — mw) : (8.47) 


We use the notation x ~ N;y(f, Ł) to denote that x follows such a distribution. 
This normal distribution plays an important role in multivariate statistical analysis 
and it has several nice properties. Here we consider only those properties that are 
relevant to our study. Interested readers are referred to Johnson and Wichern (1998) 
for details. 

To gain insight into multivariate normal distributions, consider the bivariate case 
(i.e., k = 2). In this case, we have 


— | O11 O12 =e 1 02 — O12 
= s 2s oe : 
O12 +922 011022 — Oj5 —0]2 O11 
Using the correlation coefficient p = o12/(0102), where o; = ,/o;; is the standard 
deviation of x;, we have 012 = p./o11022 and |Z| = 011022 (1 — p°). The pdf of x 
then becomes 


fœ, x2lu, Z) = Qx, m, Di}, 


1 
— p| -l 
210102 1 — p? | 2(1 — p?) 
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where 


2 2 
XxX, — Hı X2 — W2 X1 — Hı X2 — W2 
ce.) = ( )+( ) -20( )( ) 
o1 02 o1 02 


Chapter 4 of Johnson and Wichern (1998) contains some plots of this pdf function. 

Let c = (cy,..., cx)’ be a nonzero k-dimensional vector. Partition the random 
vector as x = (x/,,x5)', where xı = (x1,..., Xp) and x2 = (Xp+1; ---, XK) with 
1 < p < k. Also partition u and X accordingly as 


xı | oy (| m Zi Zp 
x2 My |? | En En ` 
Some properties of x are as follows: 


1. cx ~ N(c'p, c’Xc). That is, any nonzero linear combination of x is uni- 
variate normal. The inverse of this property also holds. Specifically, if c'x is 
univariate normal for any nonzero vector c, then x is multivariate normal. 

2. The marginal distribution of x; is normal. In fact, x; ~ Nx, (Mi, Xii) for i = 
1 and 2, where kı = p and k = k — p. 

3. X2 = 0 if and only if xı and x2 are independent. 

4. The random variable y = (x — WET! (x — 4) follows a chi-squared distri- 
bution with m degrees of freedom. 

5. The conditional distribution of x; given x2 = b is also normally distributed 
as 


(xy|x2 = b) ~ Niu; + Li2Bzy (b — m), £n — EnEn LI. 


The last property is useful in many scientific areas. For instance, it forms the 
basis for time series forecasting under the normality assumption and for recursive 
least-squares estimation. 


APPENDIX C: SOME SCA COMMANDS 
The following SCA commands are used in the analysis of Example 8.6: 


input x1,x2. file ‘m-gsin3-5301.txt’ % Load data 


rl=ln(x1) % Take log transformation 


r2=1n (x2) 

miden r1,r2. no ccm. arfits 1 to 8. 

-- % Denote the model by v21. 

mtsm v21. series r1,r2. model (i-pl*b-p2*b**2)series= @ 
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ct+(i-t1i*b) noise. 


mestim v21. 


pl1(2,1)=0 


etl (2,1)51 


Q 


o 


o 
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Initial estimation 


Set zero constraints 


-- % Refine estimation and store residuals 


mestim v21. 


miden res1,res2. 


EXERCISES 


method exact. hold resi (res1,res2) 


8.1. Consider the monthly log stock returns, in percentages and including divi- 


8.2. 


dends, of Merck & Company, Johnson & Johnson, General Electric, General 
Motors, Ford Motor Company, and value-weighted index from January 1960 
to December 2008; see the file m-mrk2vw. txt. 


(a) Compute the sample mean, covariance matrix, and correlation matrix of 
the data. 

(b) Test the hypothesis Ho : p4 =--- = pẹ = 0, where p; is the lag-i cross- 
correlation matrix of the data. Draw conclusions based on the 5% signifi- 
cance level. 

(c) Is there any lead-lag relationship among the six return series? 

The Federal Reserve Bank of St. Louis publishes selected interest rates and 

U.S. financial data on its website: http://research.stlouisfed.org/fred2/. 

Consider the monthly 1-year and 10-year Treasury constant maturity rates from 

April 1953 to October 2009 for 679 observations; see the file m-gs1n10. txt. 

The rates are in percentages. 


(a) Let c; =r, — r—ı be the change series of the monthly interest rate rz. 
Build a bivariate autoregressive model for the two change series. Discuss 
the implications of the model. Transform the model into a structural form. 
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8.3. 


8.4. 


8.5. 


8.6. 


8.7. 


(b) Build a bivariate moving-average model for the two change series. Discuss 
the implications of the model and compare it with the bivariate AR model 
built earlier. 


Again consider the monthly l-year and 10-year Treasury constant maturity 
rates from April 1953 to October 2009. Consider the log series of the data 
and build a VARMA model for the series. Discuss the implications of the 
model obtained. 


Again consider the monthly l-year and 10-year Treasury constant maturity 
rates from April 1953 to October 2009. Are the two interest rate series 
threshold cointegrated? Use the interest spread s; = r10,t — r1,t as the thresh- 
old variable, where r;; is the i-year Treasury constant maturity rate. If they 
are threshold cointegrated, build a multivariate threshold model for the two 
series. 


The bivariate AR(4) model x, — ®4x,_4 = o + a; is a special seasonal model 
with periodicity 4, where {a;} is a sequence of independent and identically 
distributed normal random vectors with mean zero and covariance matrix Z. 
Such a seasonal model may be useful in studying quarterly earnings of a 
company. (a) Assume that x; is weakly stationary. Derive the mean vector 
and covariance matrix of x,. (b) Derive the necessary and sufficient condition 
of weak stationarity for x;. (c) Show that Fe = ®4I¢_4 for £>0, where Te 
is the lag-@ autocovariance matrix of xz. 

The bivariate MA(4) model x; = a; — @4a;_,4 is another seasonal model with 
periodicity 4, where {a;} is a sequence of independent and identically dis- 
tributed normal random vectors with mean zero and covariance matrix ®©. 
Derive the covariance matrices Ty of x, for €=0,...,5. 

Consider the monthly U.S. 1-year and 3-year Treasury constant maturity rates 
from April 1953 to March 2004. The data can be obtained from the Federal 
Reserve Bank of St. Louis or from the file m-gs1n3-5304.txt (1-year, 3- 
year, dates). See also Example 8.6, which uses a shorter data span. Here we 
use the interest rates directly without the log transformation and define x, = 
(x1r, X2;)’, where x1, is the 1-year maturity rate and x2, is the 3-year maturity 
rate. 


(a) Identify a VAR model for the bivariate interest rate series. Write down the 
fitted model. 

(b) Compute the impulse response functions of the fitted VAR model. It suf- 
fices to use the first 6 lags. 

(c) Use the fitted VAR model to produce l-step- to 12-step-ahead forecasts 
of the interest rates, assuming that the forecast origin is March 2004. 

(d) Are the two interest rate series cointegrated, when a restricted constant 
term is used? Use 5% significance level to perform the test. 

(e) If the series are cointegrated, build an ECM for the series. Write down 
the fitted model. 
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(£) Use the fitted ECM to produce 1-step- to 12-step-ahead forecasts of the 
interest rates, assuming that the forecast origin is March 2004. 


(g) Compare the forecasts produced by the VAR model and the ECM. 
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CHAPTER 9 


Principal Component Analysis 
and Factor Models 


Most financial portfolios consist of multiple assets, and their returns depend con- 
currently and dynamically on many economic and financial variables. Therefore, it 
is important to use proper multivariate statistical analyses to study the behavior and 
properties of portfolio returns. However, as demonstrated in the previous chapter, 
analysis of multiple asset returns often requires high-dimensional statistical models 
that are complicated and hard to apply. To simplify the task of modeling multiple 
returns, we discuss in this chapter some dimension reduction methods to search 
for the underlying structure of the assets. Principal component analysis (PCA) is 
perhaps the most commonly used statistical method in dimension reduction, and 
we start our discussion with the method. In practice, observed return series often 
exhibit similar characteristics leading to the belief that they might be driven by 
some common sources, often referred to as common factors. To study the common 
pattern in asset returns and to simplify portfolio analysis, various factor models 
have been proposed in the literature to analyze multiple asset returns. The second 
goal of this chapter is to introduce some useful factor models and demonstrate their 
applications in finance. 

Three types of factor models are available for studying asset returns; see Connor 
(1995) and Campbell, Lo, and MacKinlay (1997). The first type is the macroeco- 
nomic factor models that use macroeconomic variables such as growth rate of GDP, 
interest rates, inflation rate, and unemployment rate to describe the common behav- 
ior of asset returns. Here the factors are observable and the model can be estimated 
via linear regression methods. The second type is the fundamental factor models 
that use firm or asset specific attributes such as firm size, book and market val- 
ues, and industrial classification to construct common factors. The third type is the 
statistical factor models that treat the common factors as unobservable or latent 
variables to be estimated from the returns series. In this chapter, we discuss all 


Analysis of Financial Time Series, Third Edition, By Ruey S. Tsay 
Copyright © 2010 John Wiley & Sons, Inc. 


467 


468 PRINCIPAL COMPONENT ANALYSIS AND FACTOR MODELS 


three types of factor models and their applications in finance. Principal component 
analysis and factor models for asset returns are also discussed in Alexander (2001) 
and Zivot and Wang (2003). 

The chapter is organized as follows. Section 9.1 introduces a general factor 
model for asset returns, and Section 9.2 discusses macroeconomic factor models 
with some simple examples. The fundamental factor model and its applications are 
given in Section 9.3. Section 9.4 introduces principal component analysis that serves 
as the basic method for statistical factor analysis. The PCA can also be used to 
reduce the dimension in multivariate analysis. Section 9.5 discusses the orthogonal 
factor models, including factor rotation and its estimation, and provides several 
examples. Finally, Section 9.6 introduces asymptotic principal component analysis. 


9.1 A FACTOR MODEL 


Suppose that there are k assets and T time periods. Let r;; be the return of asset i 
in the time period t. A general form for the factor model is 


Tit = Qi + Bir fir +--+ + Bim ft + Et, t= orate: be i=1,...,k, (1) 


where œ; is a constant representing the intercept, { fj;|j = 1, ...,m} are m common 
factors, Bij is the factor loading for asset i on the jth factor, and €;; is the specific 
factor of asset i. 

For asset returns, the factor f, = (fir,..., fm) is assumed to be an 
m-dimensional stationary process such that 


E(f) = My, 


Cov(f,) = Zy, anm xm matrix, 


and the asset specific factor €;; is a white noise series and uncorrelated with the 
common factors fj; and other specific factors. Specifically, we assume that 


E(€ci)=0 for alli and f, 


Cov( fiz, €is) = 0 for all j, i, t and s, 
o?, ifi=jandt=s, 


Cover, €js) = | 0. otherwise. 


Thus, the common factors are uncorrelated with the specific factors, and the specific 
factors are uncorrelated among each other. The common factors, however, need not 
be uncorrelated with each other in some factor models. 

In some applications, the number of assets k may be larger than the number 
of time periods T. We discuss an approach to analyze such data in Section 9.6. It 
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is also common to assume that the factors, hence r;, are serially uncorrelated in 
factor analysis. In applications, if the observed returns are serially dependent, then 
the models in Chapter 8 can be used to remove the serial dependence. 

In matrix form, the factor model in Eq. (9.1) can be written as 


Tit = di + Bf, + it, 


where B; = (fi1,..., Bim) is a row vector of loadings, and the joint model for the 
k assets at time t is 


rr=a+Bf,+e, ae eee a (9.2) 
where r; = (Fin... Fu), œ = (ay,...,a%)', B = [iy] is a k xm factor- 
loading matrix, and €; = (€1;,..., €kt) is the error vector with Cov(e;) = D = 
diag{o?, san ok}, a k x k diagonal matrix. The covariance matrix of the return r; 


is then 
Cov(r:) = PE ;p' + D. 
The model presentation in Eq. (9.2) is in a cross-sectional regression form if the 


factors fj; are observed. 
Treating the factor model in Eq. (9.1) as a time series, we have 


Ri = alr + FB, + E;, (9.3) 
for the ith asset (i = 1,..., k), where R; = (ri, ..., rir), Ir is a T-dimensional 
vector of ones, F is a T x m matrix whose tth row is fi and E; = (€1,...,47)’. 


The covariance matrix of E; is Cov(E;) = of I , aT x T diagonal matrix. 
Finally, we can rewrite Eq. (9.2) as 


r, = g, + €r, 


where g, = (1, fi) and & = [«œ, B], which is a k x (m + 1) matrix. Taking the 
transpose of the prior equation and stacking all data together, we have 


R=Gé'+E, (9.4) 


where R is a T x k matrix of returns whose tth row is r, or, equivalently, whose 
ith column is R; of Eq. (9.3), G is a T x (m+ 1) matrix whose tth row is g/, 
and E is a T x k matrix of specific factors whose rth row is €}. If the common 
factors f, are observed, then Eq. (9.4) is a special form of the multivariate linear 
regression (MLR) model; see Johnson and Wichern (2007). For a general MLR 
model, the covariance matrix of €; need not be diagonal. 
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9.2 MACROECONOMETRIC FACTOR MODELS 
For macroeconomic factor models, the factors are observed and we can apply the 


least-squares method to the MLR model in Eq. (9.4) to perform estimation. The 
estimate is 


p a’ 
=| œ |= (G'G) "(G'R), 
| 3 | (G G) (GR) 


from which the estimates of œ and ĝ are readily available. The residuals of Eq. 
(9.4) are 


E = R-Gé’. 
Based on the model assumption, the covariance matrix of €; is estimated by 
D = diag{6?, ..., 67}, 


where Oy is the (i, i)th element of E E/(T — m — 1). Furthermore, the R? of the 
ith asset of Eq. (9.3) is 


EE); 
Re =1-! A hi, i=1,...,k, 
[RR]; 


where A; ; denotes the (i, 7)th element of the matrix A. 

Note that the aforementioned least-squares estimation does not impose the con- 
straint that the specific factors €;, are uncorrelated with each other. Consequently, 
the estimates obtained are not efficient in general. However, imposing the orthog- 
onalization constraint requires nontrivial computation and is often ignored. One 
can check the off-diagonal elements of the matrix EE /(T — m — 1) to verify the 
adequacy of the fitted model. These elements should be close to zero. 


9.2.1 Single-Factor Model 


The best known macroeconomic factor model in finance is the market model; see 
Sharpe (1970). This is a single-factor model and can be written as 


Tit = Qi + Bilmt + Eit, AA EE E E pada (9.5) 


where rj; is the excess return of the ith asset, rms is the excess return of the market, 
and £; is the well-known £ for stock returns. To illustrate, we consider monthly 
returns of 13 stocks and use the return of the S&P 500 index as the market return. 
The stocks used and their tick symbols are given in Table 9.1, and the sample 
period is from January 1990 to December 2003 so that k = 13 and T = 168. We 
use the monthly series of 3-month Treasury bill rates of the secondary market as 
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TABLE 9.1 Stocks Used and Their Tick Symbols in Analysis of Single-Factor Model* 


Tick Company 
AA Alcoa 

AGE A.G. Edwards 
CAT Caterpillar 

F Ford Motor 
FDX FedEx 


GM General Motors 
HPQ Hewlett-Packard 


r(o;) 
1.09(9.49) 
1.36(10.2) 
1.23(8.71) 
0.97(9.77) 
1.14(9.49) 
0.64(9.28) 
1.37(11.8) 


Tick 
KMB 
MEL 
NYT 
PG 
TRB 
TXN 
SP5 


Company 
Kimberly-Clark 
Mellon Financial 
New York Times 
Procter & Gamble 
Chicago Tribune 
Texas Instrument 
S&P 500 Index 


r(o;) 
0.78(6.50) 
1.36(7.80) 
0.81(7.37) 
1.08(6.75) 
0.95(7.84) 
2.19(13.8) 
0.42(4.33) 


“Sample means (standard errors) of excess returns are also given. The sample period is from January 


1990 to December 2003. 


the risk-free interest rate to obtain simple excess returns of the stock and market 
index. The returns are in percentages. 

We use S-Plus to implement the estimation method discussed in the previous 
section. Most of the commands used also apply to the software R. 


rtn=x[;1:13] 


VVVVVV VV 


xit.hat=solve(xmtx,rtn) 
beta. hat=t(xit.hat[2,]) 
E.hat=rtn-xmtx%*%xit.hat 
D.hat=diag(crossprod (1 
r.square=1- (168-2) *D.hat/diag (var (rtn, SumSquares=T) ) 


E.hat) / (168-2) ) 


x=read.matrix(‘‘m-fac9003.txt’’,header=T) 
xmtx=cbind(rep(1,168),x[,14]) 


The estimates of f;, ar; and R? for the ith asset return are given below: 


> t(rbind(beta.hat,sqrt(D.hat) ,r.square) ) 


beta. hat 

AA 1.292 
AGE 1.514 
CAT 0.941 
F 1.5219 
FDX 0.805 
GM 1.046 
HPQ 1.628 
KMB 0.550 
MEL 1.123 
NYT 0.771 
PG 0.469 
TRB 0.718 
TXN 1.796 


sigma(i) 4r.square 
7.694 0,347 
7.808 0,415 
7:725 O29 
8.241 0.292 
8.854 04135 
8.130 0.238 
9.469 0.358 
6.070 0.134 
6.120 0.388 
6.590 0.205 
6.459 0.090 
Taa LG 0.157 
11.474 0.316 


Figure 9.1 shows the bar plots of Bi and R°? of the 13 stocks. The financial 
stocks, AGE and MEL, and the high-tech stocks, HPQ and TXN, seem to have 
higher 6 and R*. On the other hand, KMB and PG have lower f and R*. The R? 
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AA AGE CAT F FDX GM HPQ KMB MEL NYT PG TRB TXN 
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AA AGE CAT F FDX GM HPQ KMB MEL NYT PG TRB TXN 


Figure 9.1 Bar plots of (a) beta and (b) R? for fitting single-factor market model to monthly excess 
returns of 13 stocks. S&P 500 index excess return is used as market index. Sample period is from 
January 1990 to December 2003. 


ranges from 0.09 to 0.41, indicating that the market return explains less than 50% 
of the variabilities of the individual stocks used. 

The covariance and correlation matrices of r; under the market model can be 
estimated using the following: 


sd.r=sqr 


vvv Vv 
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cov.r=var(x[,14])*(t(beta.hat) %*%beta. 
t (diag(cov.r) ) 
corr.r=cov.r/outer(sd.r,sd.r) 
print (corr.r,digits=1,width=2) 
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We can compare these estimated correlations with the sample correlations of the 
excess returns. 


> print (cor(rtn) ,digits=1,width=2) 


AA AGE CAT F FDX GM HPQ KMB MEL NYT PG TRB TXN 
AA 1.0 0.3 0.6 0.5 0.2 0.4 0.5 0.3 0.4 0.4 0.1 0.3 0.5 
AGE 0.3 1.0 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.2 0.2 0.3 
CAT 0.6 0.3 1.0 0.4 0.2 0.3 0.2 0.3 0.4 0.3 0.1 0.4 0.3 
F 0.5 0.3 0.4 1.0 0.3 0.6 0.3 0.3 0.4 0.4 0.1 0.3 0.3 
FDX 0.2 0.3 0.2 0.3 1.0 0.2 0.3 0.3 0.2 0.2 0.1 0.3. 0.2 
GM 0.4 0.3 0.3 0.6 0.2 1.0 0.3 0.3 0.4 0.2 0.1 0.3 0.3 
HPO. 0.5 0.3 0.2 0:3 0.3° 0.3 12.0 0.1 0.3 0.3 0.1 0,2 0.6 
KMB 0:3 0.3 0.3 0.2 0.3 0.3 0.1 1.0 0:3 0.2 0.3 0.3 0.2 
MEL 0.4 0.4 0.4 0.4 0.2 0.4 0.3 0.4 1.0 0.3 0.4 0.3 0.3 
NYT 0.4 0.4 0.3 0.4 0.3 0.2 0.3 0.2 0.3 1.0 0.2 0.5 0.2 
PG 0.1 0.2 0.1.0.2 0.1 0.2.0.1 0.3 0.4 0.2 1:0 0.3 O22 
TRB 0.3 0.2 0.4 0.3 0.3 0.3 0.2 0.3 0.3 0.5 0.3 1.0 0.2 
TEN 0.5 0.3 0.3 0.3 0.2 0:3 0.6 0.12 0.3 0.2.0.1 0.2 1.0 
In finance, one can use the concept of global minimum variance portfolio 
(GMVP) to compare the covariance matrix implied by a fitted factor model with 


the sample covariance matrix of the returns. For a given covariance matrix &, the 
global minimum variance portfolio is the portfolio œ that solves 


mino ,=@'Z@ such that wl =1, 
uno p, 


where O o is the variance of the portfolio. The solution is given by 


E1 
@ = ecli? 
Vvx'1 
where 1 is the k-dimensional vector of ones. 


For the market model considered, the GMVP for the fitted model and the data 
are as follows: 


> w.gmin.model=solve(cov.r) %*%rep(1,nrow(cov.r) ) 
> w.gmin.model=w.gmin.model/sum(w.gmin.model) 
> t(w.gmin.model) 


AA AGE CAT F FDX GM 
[1,] 0.0117 -0.0306 0.0792 0.0225 0.0802 0.0533 
HPQ KMB MEL NYT PG TRB TXN 


[1,] -0.0354 0.2503 0.0703 0.1539 0.2434 0.1400 -0.0388 
> w.gmin.data=solve(var(rtn) )%*%rep(1,nrow(cov.r) ) 

> w.gmin.data=w.gmin.data/sum(w.gmin.data) 

> t(w.gmin. data) 
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AA AGE CAT F FDX GM 
[1,] -0.0073 -0.0085 0.0866 -0.0232 0.0943 0.0916 
HPQ KMB MEL NYT PG TRB TXN 


[1,] 0.0345 0.2296 0.0495 0.1790 0.2651 0.0168 -0.0080 


Comparing the two GMVPs, the weights assigned to TRB stock differ markedly. 
The two portfolios, however, have larger weights for KMB, NYT, and PG stocks. 

Finally, we examine the residual covariance and correlation matrices to verify 
the assumption that the special factors are not correlated among the 13 stocks. The 
first four columns of the residual correlation matrix are given below and there exist 
some large values in the residual cross correlations, for example, Cor(CAT,AA) = 
0.45 and Cor(GM,F) = 0.48. 


> resi.cov=t(E.hat) %*%E.hat/ (168-2) 
> resi.sd=sqrt(diag(resi.cov) ) 
> resi.cor=resi.cov/outer (resi.sd,resi.sd) 
> print (resi.cor,digits=1,width=2) 
AA AGE CAT F 
AA 1.00 -0.13 0.45 0.22 
AGE -0.13 1.00 -0.03 -0.01 
CAT 0.45 -0.03 1.00 0.23 
F 0.22 -0.01 0.23 1.00 
FDX. 0.00 0.14 0.05 0.07 
GM 0.14 -0.09 0.15 0.48 
HPQ 0.24 -0.13 -0.07 -0.00 
KMB 0.16 0.06 0.18 0.05 
MEL -0.02 0.06 0.09 0.10 
NYT 0.13 0.10 0.07 O.19 
PG -0.15 -0.02 -0.01 -0.07 
TRB 0.12 -0.02 0.25 0.16 
TXN 0.19 -0.17 0.09 -0.02 


9.2.2 Multifactor Models 


Chen, Roll, and Ross (1986) consider a multifactor model for stock returns. The fac- 
tors used consist of unexpected changes or surprises of macroeconomic variables. 
Here unexpected changes denote the residuals of the macroeconomic variables after 
removing their dynamic dependence. A simple way to obtain unexpected changes 
is to fit a VAR model of Chapter 8 to the macroeconomic variables. For illustration, 
we consider the following two monthly macroeconomic variables: 


1. Consumer price index (CPI) for all urban consumers: all items and with index 
1982—1984 = 100. 

2. Civilian employment numbers 16 years and over (CE16): measured in thou- 
sands. 
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Both CPI and CE16 series are seasonally adjusted, and the data span is from January 
1975 to December 2003. We use a longer period to obtain the surprise series of the 
variables. For both series, we construct the growth rate series by taking the first 
difference of the logged data. The growth rates are in percentages. 

To obtain the surprise series, we use the BIC criterion to identify a VAR(3) 
model. Thus, the two macroeconomic factors used in the factor model are the 
residuals of a VAR(3) model from 1990 to 2003. For the excess returns, we use 
the same 13 stocks as before. Details of the analysis follow: 


da=read.table(’m-cpicel6-dp7503.txt’) ,header=T) 
cpi=dal[,1] 
cen=da[,2] 
xl=cbind(cpi, cen) 
yl=data. frame (x1) 
ord. choice=VAR(y1,max.ar=13) 
ord.choiceSinfo 
ar (1) ar (2) ar (3) ar (4) ar (5) ar (6) 
BIC 36.992 38.093 28.234 46.241 60.677 75.810 


VVVVV VV 


ar(7) ar(8) ar(9) ar(10) ar(11) ar(12) ar(13) 
BIC 86.23 99.294 111.27 125.46 138.01 146.71 166.92 
> var3.f£it=VAR(x1lar (3) ) 
> res=var3.fitSresiduals[166:333,1:2] 
> da=matrix(scan(file='’m-fac9003.txt’),14) 
> xmtx = cbhind(rep(1,168),res) 
> da=t (da) 
> rtn=da[,1:13] 
> xit.hat=solve(xmtx,rtn) 
> beta. hat=t(xit.hat[2:3,]) 
> E.hat=rtn - xmtx%*%xit.hat 
> D.hat=diag(crossprod(E.hat) / (168-3) ) 
> r.square=1- (168-3) *D.hat/diag (var (rtn, SumSquares=T) ) 


Figure 9.2 shows the bar plots of the beta estimates and R? for the 13 stocks. It 
is interesting to see that all excess returns are negatively related to the unexpected 
changes of CPI growth rate. This seems reasonable. However, the R? of all excess 
returns are low, indicating that the two macroeconomic variables used have very 
little explanatory power in understanding the excess returns of the 13 stocks. 

The estimated covariance and correlation matrices of the two-factor model can 
be obtained using the following: 


cov.rtn=beta.hat%*%var (res) 3*%t (beta. hat) +diag(D.hat) 
sd.rtn=sqrt (diag(cov.rtn) ) 

cor.rtn = cov.rtn/outer(sd.rtn,sd.rtn) 

print (cor.rtn, diits=1,width=2) 


Vvv Vv 
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Beta for CPI surprise Beta for CE16 surprise R? 


-14-12-10 -8 -6 -4 2 0 4 2 0 22 0.0 0.02 0.04 0.06 


Figure 9.2 Bar plots of betas and R? for fitting two-factor model to monthly excess returns of 13 
stocks. Sample period is from January 1990 to December 2003. 


The correlation matrix is very close to the identity matrix, indicating that the two- 
factor model used does not fit the excess returns well. Finally, the correlation matrix 
of the residuals of the two-factor model is given by the following: 


cov.resi=t(E.hat) %*%E.hat/ (168-3) 
sd.resi=sqrt (diag(cov.resi) ) 
cor.resi=cov.resi/outer(sd.resi,sd.resi) 
print (cor.resi,digits=1,width=2) 


vvv v 


As expected, this correlation matrix is close to that of the original excess returns 
given before and is omitted. 


9.3 FUNDAMENTAL FACTOR MODELS 


Fundamental factor models use observable asset specific fundamentals such as 
industrial classification, market capitalization, book value, and style classification 
(growth or value) to construct common factors that explain the excess returns. 
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There are two approaches to fundamental factor models available in the literature. 
The first approach is proposed by Bar Rosenberg, founder of BARRA Inc., and is 
referred to as the BARRA approach; see Grinold and Kahn (2000). In contrast to 
the macroeconomic factor models, this approach treats the observed asset specific 
fundamentals as the factor betas, B;, and estimates the factors f, at each time 
index ¢ via regression methods. The betas are time invariant, but the realizations 
f, evolve over time. The second approach is the Fama—French approach proposed 
by Fama and French (1992). In this approach, the factor realization f;, for a given 
specific fundamental is obtained by constructing some hedge portfolio based on 
the observed fundamental. We briefly discuss the two approaches in the next two 
sections. 


9.3.1 BARRA Factor Model 


Assume that the excess returns and, hence, the factor realizations are mean cor- 
rected. At each time index f, the factor model in Eq. (9.2) reduces to 


Fr, = Bf, +, (9.6) 


where 7; denotes the (sample) mean-corrected excess returns and, for simplicity in 
notation, we continue to use f, as factor realizations. Since B is given, the model 
in Eq. (9.6) is a multiple linear regression with k observations and m unknowns. 
Because the number of common factors m should be less than the number of 
assets k, the regression is estimable. However, the regression is not homogeneous 
because the covariance matrix of €; is D = diag{o?, aes of} with o? = Var(€iz), 
which depends on the ith asset. Consequently, the factor realization at time index t 
can be estimated by the weighted least-squares (WLS) method using the standard 
errors of the specific factors as the weights. The resulting estimate is 


F, = (B'D~'B) (B'D'F,). (9.7) 


In practice, the covariance matrix D is unknown so that we use a two-step procedure 
to perform the estimation. 

In step one, the ordinary least-squares (OLS) method is used at each time index 
t to obtain a preliminary estimate of f, as 


PS irin 
io = (B'P) (BF), 

where the second subscript o is used to denote the OLS estimate. This estimate of 

factor realization is consistent, but not efficient. The residual of the OLS regres- 


sion is 


Ero = ř,— Bio 
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Since the residual covariance matrix is time invariant, we can pool the residuals 
together (for t = 1,..., T) to obtain an estimate of D as 


F 

A . ji ; 

D, = ilz ) ot) % 
t=1 


In step two, we plug in the estimate D, to obtain a refined estimate of the factor 
realization 


> ai A l ala 
Fae = (BD, 8) (BDF). (9.8) 
where the second subscript g denotes the generalized least-squares (GLS) esti- 


mate, which is a sample version of the WLS estimate. The residual of the refined 
regression is 


Erg =F — Big 


from which we estimate the residual variance matrix as 


T 

~ , 1 

D, = diag = 4 > enseia) : 
t=l 


Finally, the covariance matrix of the estimated factor realizations is 
I T 
Lf = T-1 X Gae = FS ig E foD 
t=1 


where 


From Eq. (9.6), the covariance matrix of the excess returns under the BARRA 
approach is 


Cov(r;) = BEB + De. 


Industry Factor Model 

For illustration, we consider monthly excess returns of 10 stocks and use 
industrial classification as the specific asset fundamental. The stocks used are 
given in Table 9.2 and can be classified into three industrial sectors—namely, 
financial services, computer and high-tech industry, and other. The sample period 
is again from January 1990 to December 2003. Under the BARRA framework, 
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TABLE 9.2 Stocks Used and Their Tick Symbols in Analysis of Industrial Factor 
Model“ 


Tick Company r(0;) Tick Company r(o;) 

AGE A.G. Edwards 1.36(10.2) HPQ Hewlett-Packard 1.37(11.8) 
C Citigroup 2.08(9.60) IBM Int. Bus. Machines 1.06(9.47) 
MWD Morgan Stanley 1.87(11.2) AA Alcoa 1.09(9.49) 
MER Merrill Lynch 2.08(10.4) CAT Caterpillar 1.23(8.71) 
DELL Dell Inc. 4.82(16.4) PG Procter & Gamble 1.08(6.75) 


“Sample mean and standard deviation of the excess returns are also given. The sample span is from 
January 1990 to December 2003. 


there are three common factors representing the three industrial sectors and the 
betas are indicators for the three industrial sectors; that is, 


Tit = Bir fit + Bi2 for + Bia far + Gir b= 1,...25,10, (9.9) 


with the betas being 


1 if asset i belongs to the j industrial sector, 
Bij = | E : | (9.10) 


0 otherwise, 


where j = 1, 2, 3 representing the financial, high-tech, and other sectors, respec- 
tively. For instance, the beta vector for the IBM stock return is B; = (0, 1, 0)’ and 
that for Alcoa stock return is B; = (0, 0, 1)’. 

In Eq. (9.9), fir is the factor realization of the financial services sector, fr is 
that of the computer and high-tech sector, and f3; is for the other sector. Because 
the §;; are indicator variables, the OLS estimate of f, is extremely simple. Indeed, 
f, is the vector consisting of the averages of sector excess returns at time t. 
Specifically, 


AGE,+C,;,+MDW,+MER, 
4 
F = DELL,+HPQ,+IBM, 
to — —_. “4,2... —— 


AA,+CAT,+PG, 
3 


The specific factor of the ith asset is simply the deviation of its excess return 
from its industrial sample average. One can then obtain an estimate of the residual 
variance matrix D to perform the generalized least-squares estimation. We use 
S-Plus to perform the analysis. The commands also apply to R. First, load the 
returns into S-Plus, remove the sample means, create the industrial dummies, and 
compute the sample correlation matrix of the returns. 


> da=read.table(’m-barra-9003.txt’),header=T) 
> rm = matrix(apply(da,2,mean) ,1) 
> rtn = da - matrix(1,168,1)%*%rm 
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> fin = c(rep(1,4),rep(0,6)) 
> tech = c(rep(0,4),rep(1,3),rep(0,3) 
> oth = c(rep(0,7),rep(1,3)) 
> ind.dum = cbhind(fin,tech, oth) 
> ind.dum 
fin tech oth 
is 1 0 0 
2; i 0 0 
3, al 0 0 
4, ul 0 0 
5, 0 dl. 0 
6, 0 1 0 
7, 0 1 0 
8, 0 0 1 
9: 0 0 1 
ELO; 0 0 1 
> cov.rtn=var (rtn) 
> sd.rtn=sqrt (diag(cov.rtn) ) 
> corr.rtn=cov.rtn/outer(sd.rtn,sd.rtn) 
> print (corr.rtn, digits=1,width=2) 
AGE C MWD MER DELL HPQ IBM AA CAT PG 
AGE 1.0 0.6 0.6 0.6 0.3 0.3 0.3 0.3 0.3 0.2 
C 0.6 1.0 0.7 0.7 0.2 0.4 0.4 0.4 0.4 0.3 
MWD. 0.6 0.7 1.0 0.8 0.3 0.5 0.4 0.4 0.3 0.3 
MER 0.6 0.7 0.8 1.0 0.2 0.5 0.3 0.4 0.3 0.3 
DELL 0.3 0.2 0.3 0.2 1.0 0.5 0.4 0.3 0.1 0.1 
HPQ 0.3 0.4 0.5 0.5 0.4 1.00.5 0.5 0.2 0.1 
IBM 0.3 0.4 0.4 0.3 0.4 0.5 1.0 0.4 0.3-0.0 
AA 0.3 0.4 0.4 0.4 0.3 0.5 0.4 1.0 0.6 0.1 
CAT 0:3 0,4 0.3: 073 O02 0.2 00:3 026 2.0 0.1 
PG 0.2 0.3 0.30.3 0.2 0.1-0.0 0.1 0.12 1.0 


The OLS estimates, their residuals, and residual variances are estimated as fol- 
lows: 


Vv 


F.hat.o = solve(crossprod(ind.dum) )$*%t (ind.dum) $*%rtn.rm 
E.hat.o = rtn.rm - ind.dum%*%F.hat.o 
> diagD.hat.o=rowVars(E.hat.o) 


Vv 


One can then obtain the generalized least-squares estimates. 


> Dinv.hat = diag(diagD.hat.o*(-1)) 

> Hmtx=solve(t(ind.dum) 3*%Dinv.hat%*%ind. dum) $*%t (ind. dum) 
S*SDinv.hat 

F.hat.g = Hmtx%*%rtn.rm 

F.hat.gt=t(F.hat.g) 

E.hat.g = rtn.rm - ind.dum%*%F.hat.g 

diagD.hat.g = rowVars(E.hat.g) 

t (Hmtx) 
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[10, 0. 
> cov.ind=ind.dum%*$var (F.hat.gt)%*%t (ind. 


fin 
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+ diag(diagD.hat.g) 


> sd.ind=sqrt (diag(cov.ind) ) 


oe oo oO Oo Oo © Oo © 


oth 


0000 
0000 
0000 
0000 
0000 
0000 
0000 
«3319 
4321 
2360 


> corr.ind=cov.ind/outer(sd.ind,sd.ind) 
> print (corr.ind, digits=1,width=2) 
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The model-based correlations of stocks within an 
than their sample counterparts. For instance, the sample correlation between CAT 
and PG stock returns is only 0.1, but the correlation based on the fitted model is 
0.6. Finally, Figure 9.3 shows the time plots of the factor realizations based on the 


generalized least-squares estimation. 


Factor Mimicking Portfolio 
Consider the special case of BARRA factor models with a single factor. Here the 
WLS estimate of f, in Eq. (9.7) has a nice interpretation. Consider a portfolio 


© S= (Oi; 


., @)’ of the k assets that solves 
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industrial sector are larger 


min(5@’ Dø) such that o’B = 1. 
@ 


It turns out that the solution to this portfolio problem is given by 


w! = (B'D~'B)!(p'D~'). 


Thus, the estimated factor realization is the portfolio return 


Ê = o'r. 
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Figure 9.3 Estimated factor realizations of BARRA industrial factor model for 10 monthly stock 
returns in 3 industrial sectors: (a) factor realizations: financial sector, (b) high-tech sector, and (c) other 
sector. 


If the portfolio @ is normalized such that S wi = I, it is referred to as a factor 
mimicking portfolio. For multiple factors, one can apply the idea to each factor 
individually. 


Remark. In practice, the sample mean of an excess return is often not signif- 
icantly different from zero. Thus, one may not need to remove the sample mean 
before fitting a BARRA factor model. 


9.3.2 Fama-French Approach 


For a given asset fundamental (e.g., ratio of book-to-market value), Fama and 
French (1992) determined factor realizations using a two-step procedure. First, they 
sorted the assets based on the values of the observed fundamental. Then they formed 
a hedge portfolio, which is long in the top quintile G) of the sorted assets and 
short in the bottom quintile of the sorted assets. The observed return on this hedge 
portfolio at time ¢ is the observed factor realization for the given asset fundamental. 
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The procedure is repeated for each asset fundamental under consideration. Finally, 
given the observed factor realizations {f,|t = 1,..., T}, the betas for each asset 
are estimated using a time series regression method. These authors identify three 
observed fundamentals that explain high percentages of variability in excess returns. 
The three fundamentals used by Fama and French are (a) the overall market return 
(market excess return), (b) the performance of small stocks relative to large stocks 
(SMB, small minus big), and (c) the performance of value stocks relative to growth 
stocks (HML, high minus low). The size sorted by market equity and the ratio of 
book equity to market equity is used to define value and growth stocks with value 
stocks having high book equity to market equity ratio. 


Remark. The concepts of factor may differ between factor models. The three 
factors used in the Fama—French approach are three financial fundamentals. One 
can combine the fundamentals to create a new attribute of the stocks and refer to the 
resulting model as a single-factor model. This is particularly so because the model 
used is a linear statistical model. Thus, care must be exercised when one refers to 
the number of factors in a factor model. On the other hand, the number of factors 
is more well defined in statistical factor models, which we discuss next. 


9.4 PRINCIPAL COMPONENT ANALYSIS 


An important topic in multivariate time series analysis is the study of the covariance 
(or correlation) structure of the series. For example, the covariance structure of a 
vector return series plays an important role in portfolio selection. In what follows, 
we discuss some statistical methods useful in studying the covariance structure of 
a vector time series. 

Given a k-dimensional random variable r = (r1, ..., rg)" with covariance matrix 
ÈŁ,, a principal component analysis (PCA) is concerned with using a few linear 
combinations of r; to explain the structure of Z,. If r denotes the monthly log 
returns of k assets, then PCA can be used to study the main source of variations 
of these k asset returns. Here the keyword is few so that simplification can be 
achieved in multivariate analysis. 


9.4.1 Theory of PCA 


Principal component analysis applies to either the covariance matrix Z, or the 
correlation matrix p, of r. Since the correlation matrix is the covariance matrix 
of the standardized random vector r* = S~'r, where S is the diagonal matrix 
of standard deviations of the components of r, we use covariance matrix in our 
theoretical discussion. Let w; = (w;1,..., wiz)’ be a k-dimensional real-valued 
vector, where i = 1,...,k. Then 


k 
/ 
yi = W;r = ) Wijlj 
=] 
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is a linear combination of the random vector r. If r consists of the simple returns 
of k stocks, then y; is the return of a portfolio that assigns weight w;; to the 
jth stock. Since multiplying a constant to w; does not affect the proportion 
of allocation assigned to the jth stock, we standardize the vector w; so that 
ww =} w, = 1. 

Using properties of a linear combination of random variables, we have 


Var(y;) = wE, w; i=l,...,k, (9.11) 
Covi y) = wE, wj if=l,...,k (9.12) 


The idea of PCA is to find linear combinations w; such that y; and y; are uncorre- 
lated for i ~ j and the variances of y; are as large as possible. More specifically: 


1. The first principal component of r is the linear combination y; = w{r that 
maximizes Var(yı) subject to the constraint ww, = 1. 

2. The second principal component of r is the linear combination y, = w4r 
that maximizes Var(y2) subject to the constraints w,w2 = 1 and Cov(y2, y1) 


= 0. 
3. The ith principal component of r is the linear combination y; = w}r that 
maximizes Var(y;) subject to the constraints ww; = | and Cov(y;, y;) = 0 


for j=1,...,i— 1. 


Since the covariance matrix Ł, is nonnegative definite, it has a spectral 


decomposition; see Appendix A of Chapter 8. Let (Aj, e1), ..., (Àk, ex) be 
the eigenvalue—eigenvector pairs of %,, where Aj >A. >--- >A, >O0 and 
ei = (e;1,-.-, eik), which is properly normalized. We have the following 


statistical result. 


Result 9.1. The ith principal component of r is y; = er = ee eijrj for 
i=1,...,k. Moreover, 


Var(y;) = e;Z,-e; = Ài, oa ey 
Cov(yi, yj) = e)Z,e; = 0, ix j. 


If some eigenvalues À; are equal, the choices of the corresponding eigenvectors e; 
and hence y; are not unique. In addition, we have 


k k k 
>> Var(r;) = t(Z,) = ) a; = D> Vary). (9.13) 
i=l i=l i=l] 


The result of Eq. (9.13) says that 


Var) ài 
2P Var(r;) Apter bag 
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Consequently, the proportion of total variance in r explained by the ith princi- 
pal component is simply the ratio between the ith eigenvalue and the sum of all 
eigenvalues of Z,. One can also compute the cumulative proportion of total vari- 
ance explained by the first i principal components [i.e., (jai Aj)/ Oi àp). 
In practice, one selects a small i such that the resulting cumulative proportion is 
large. 

Since tr(p,) = k, the proportion of variance explained by the ith principal 
component becomes A; /k when the correlation matrix is used to perform the PCA. 

A by-product of the PCA is that a zero eigenvalue of &,, or p,, indicates the 
existence of an exact linear relationship between the components of r. For instance, 
if the smallest eigenvalue A, = 0, then by Result 9.1 Var(y;) = 0. Therefore, yg = 
yy eķjrj is a constant and there are only k — 1 random quantities in r. In this 
case, the dimension of r can be reduced. For this reason, PCA has been used in 
the literature as a tool for dimension reduction. 


9.4.2 Empirical PCA 


In application, the covariance matrix &, and the correlation matrix p, of the return 
vector r are unknown, but they can be estimated consistently by the sample covari- 
ance and correlation matrices under some regularity conditions. Assuming that the 
returns are weakly stationary and the data consist of {r;|t = 1,..., T}, we have 
the following estimates: 


T-1 


t=1 


T T 
a A 1 g -y _ 1 
Er = yl = FONG, Faz O14) 
t=1 
ES, (9.15) 


where S = diag{,/ô11,r, ---, y/Ôkk,r} is the diagonal matrix of sample standard 
errors of r;. Methods to compute eigenvalues and eigenvectors of a symmetric 
matrix can then be used to perform the PCA. Most statistical packages now have 
the capability to perform principal component analysis. In R and S-Plus, the basic 
command of PCA is princomp, and in FinMetrics the command is mfactor. 


Example 9.1. Consider the monthly log stock returns of International Business 
Machines, Hewlett-Packard, Intel Corporation, J.P. Morgan Chase, and Bank of 
America from January 1990 to December 2008. The returns are in percentages and 
include dividends. The data set has 228 observations. Figure 9.4 shows the time 
plots of these five monthly return series. As expected, returns of companies in the 
same industrial sector tend to exhibit similar patterns. 


Denote the returns by r’ = (IBM, HPQ, INTC, JPM, BAC). The sample mean 
vector of the returns is (0.70, 0.99, 1.20, 0.82, 0.41)’ and the sample covariance 
and correlation matrices are 
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Figure 9.4 Time plots of monthly log stock returns in percentages and including dividends for (a) 
International Business Machines, (b) Hewlett-Packard, (c) Intel, (d) J.P. Morgan Chase, and (e) Bank 
of America from January 1990 to December 2008. 
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Table 9.3 gives the results of PCA using both the covariance and correlation 
matrices. Also given are eigenvalues, eigenvectors, and proportions of variabilities 
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TABLE 9.3 Results of Principal Component Analysis for Monthly Log Returns, 
Including Dividends of Stocks of IBM, Hewlett-Packard, Intel, J.P. Morgan Chase, 
and Bank of America from January 1990 to December 20084 


Using Sample Covariance Matrix 


Eigenvalue 284.17 112.93 57.43 46.81 29.87 


Proportion 0.535 0.213 0.108 0.088 0.056 
Cumulative 0.535 0.748 0.856 0.944 1.000 
Eigenvector 0.330 0.139 —0.264 0.895 —0.014 
0.483 0.279 —0.701 —0.430 —0.116 
0.581 0.478 0.652 —0.096 —0.016 
0.448 —0.550 0.013 —0.064 0.702 
0.347 —0.610 0.119 —0.009 —0.702 


Using Sample Correlation Matrix 


Eigenvalue 2.607 1.072 0.569 0.451 0.301 


Proportion 0.522 0.214 0.114 0.090 0.060 
Cumulative 0.522 0.736 0.850 0.940 1.000 
Eigenvector 0.428 0.341 0.837 —0.002 0.008 
0.460 0.356 —0.380 0.704 0.145 
0.451 0.385 —0.389 —0.704 0.022 
0.479 —0.469 —0.046 0.052 —0.739 
0.416 —0.623 0.035 —0.073 0.658 


“The eigenvectors are in columns. 


explained by the principal components. Consider the correlation matrix and denote 
the sample eigenvalues and eigenvectors by A; and @;. We have 


îi = 2.608, 2 = (0.428, 0.460, 0.451, 0.479, 0.416)’, 
ho = 1.072,  @ = (0.341, 0.356, 0.385, —0.469, —0.623)' 


for the first two principal components. These two components explain about 74% 
of the total variability of the data, and they have interesting interpretations. The first 
component is a roughly equally weighted linear combination of the stock returns. 
This component might represent the general movement of the stock market and 
hence is a market component. The second component represents the difference 
between the two industrial sectors—namely, technologies versus financial services. 
It might be an industrial component. Similar interpretations of principal components 
can also be found by using the covariance matrix of r. 

An informal but useful procedure to determine the number of principal compo- 
nents needed in an application is to examine the scree plot, which is the time plot of 
the eigenvalues Îi ordered from the largest to the smallest (i.e., a plot of Îi versus 
i). Figure 9.5(a) shows the scree plot for the five stock returns of Example 9.1. By 
looking for an elbow in the scree plot, indicating that the remaining eigenvalues 
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Figure 9.5 Scree plots for two 5-dimensional asset returns: (a) series of Example 9.1 and (b) bond 
index returns of Example 9.3. 


are relatively small and all about the same size, one can determine the appropriate 
number of components. For both plots in Figure 9.5, two components appear to be 
appropriate. Finally, except for the case in which A; = 0 for j >i, selecting the 
first i principal components only provides an approximation to the total variance 
of the data. If a small i can provide a good approximation, then the simplification 
becomes valuable. 


VVVVVV VY 


Remark. The R and S-Plus commands used to perform the PCA are given 
below. The command princomp gives the square root of the eigenvalue and 
denotes it as standard deviation. 


rtn=read.table(‘‘m-5clog-9008.txt’’) ,header=T) 
pca.cov = princomp (rtn) 

names (pca.cov) 

summary (pca.cov) 

pca.cov$loadings 

screeplot (pca.cov) 
pca.corr=princomp (rtn, cor=T) 

summary (pac.corr) 
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9.5 STATISTICAL FACTOR ANALYSIS 


We now turn to statistical factor analysis. One of the main difficulties in multi- 
variate statistical analysis is the “curse of dimensionality.” For serially correlated 
data, the number of parameters of a parametric model often increases dramatically 
when the order of the model or the dimension of the time series is increased. Sim- 
plifying methods are often sought to overcome the curse of dimensionality. From 
an empirical viewpoint, multivariate data often exhibit similar patterns indicating 
the existence of common structure hidden in the data. Statistical factor analysis is 
one of those simplifying methods available in the literature. The aim of statistical 
factor analysis is to identify, from the observed data, a few factors that can account 
for most of the variations in the covariance or correlation matrix of the data. 

Traditional statistical factor analysis assumes that the data have no serial cor- 
relations. This assumption is often violated by financial data taken with frequency 
less than or equal to a week. However, the assumption appears to be reasonable 
for asset returns with lower frequencies (e.g., monthly returns of stocks or market 
indexes). If the assumption is violated, then one can use the parametric models 
discussed in this book to remove the linear dynamic dependence of the data and 
apply factor analysis to the residual series. 

In what follows, we discuss statistical factor analysis based on the orthogonal 
factor model. Consider the return r; = (r};,..., rx)’ of k assets at time period t and 
assume that the return series r; is weakly stationary with mean u and covariance 
matrix &,. The statistical factor model postulates that r; is linearly dependent on 
a few unobservable random variables f, = (fir,---, fmt)’ and k additional noises 
€ = (€4,..., Ekt). Here m < k, fi; are the common factors, and €;; are the errors. 
Mathematically, the statistical factor model is also in the form of Eq. (9.1) except 
that the intercept æ is replaced by the mean return mw. Thus, a statistical factor 
model is in the form 


r:— w= BS, +, (9.16) 


where B = [Pij]kxm is the matrix of factor loadings, pij is the loading of the ith 
variable on the jth factor, and ¢€;; is the specific error of rip. A key feature of 
the statistical factor model is that the m factors fis and the factor loadings ;; are 
unobservable. As such, Eq. (9.16) is not a multivariate linear regression model, 
even though it has a similar appearance. This special feature also distinguishes a 
statistical factor model from other factor models discussed earlier. 

The factor model in Eq. (9.16) is an orthogonal factor model if it satisfies the 
following assumptions: 


1. E(f,) = 90 and Cov(f,) = Im, the m x m identity matrix. 

2. E(e€;) = 0 and Cov(e,) = D = diag{o?, boars o} (i.e., D is a k x k diagonal 
matrix). 

3. fı and e, are independent so that Cov(f,, €) = E(f,€)) = Omxk- 
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Under the previous assumptions, it is easy to see that 


£, = Cov(r,) = E[r, — p) (r: — #)'] 


= EBS, +6€)(BS, + €:)'] 
= Bp'+D (9.17) 
and 
Covi, fi) = Eilr: — u) f] = BEC, f) + Ele fi) =B. (9.18) 


Using Eqs. (9.17) and (9.18), we see that for the orthogonal factor model in Eq. 
(9.16) 


Var(ri) = Ba eae Bin +a/, 
Cov(Tir, rjt) = Bi Bir a a Bimbim, 
Cov (rir, fir) = Pij- 


The quantity A +-+ Bi which is the portion of the variance of r;, contributed 
by the m common factors, is called the communality. The remaining portion o? 
of the variance of rj; is called the uniqueness or specific variance. Let E = BA + 
e+ pe. be the communality, which is the sum of squares of the loadings of the 
ith variable on the m common factors. The variance of component r;; becomes 
Var(rit) = c? + of. 

In practice, not every covariance matrix has an orthogonal factor representation. 
In other words, there exists a random variable r, that does not have any orthogonal 
factor representation. Furthermore, the orthogonal factor representation of a random 
variable is not unique. In fact, for any m x m orthogonal matrix P satisfying P P’ 
= P'P = 1, let B* = BP and f* = P’f,. Then 


r,—-eK=Bf,+6=BPP f,+6=Bfit+e. 


In addition, E(f*) = 0 and Cov( f*) = P’Cov(f,)P = P'P = I. Thus, B* and 
f; form another orthogonal factor model for r,. This nonuniqueness of orthogonal 
factor representation is a weakness as well as an advantage for factor analysis. It 
is a weakness because it makes the meaning of factor loading arbitrary. It is an 
advantage because it allows us to perform rotations to find common factors that 
have nice interpretations. Because P is an orthogonal matrix, the transformation 
f = P' f, is a rotation in the m-dimensional space. 


9.5.1 Estimation 


The orthogonal factor model in Eq. (9.16) can be estimated by two methods. 
The first estimation method uses the principal component analysis of the previ- 
ous section. This method does not require the normality assumption of the data nor 
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the prespecification of the number of common factors. It applies to both the covari- 
ance and correlation matrices. But as mentioned in PCA, the solution is often an 
approximation. The second estimation method is the maximum-likelihood method 
that uses normal density and requires a prespecification for the number of common 
factors. 


Principal Component Method 

Again let Â EEN Cie) be pairs of the eigenvalues and eigenvectors of the 
sample covariance matrix z,, where îi > is >.> hes Let m < k be the number 
of common factors. Then the matrix of factor loadings is given by 


B= [Âj] = [V EE hên | (9.19) 


The estimated specific variances are the diagonal elements of the matrix X, — B B. 
That is, D = diag{ô?, ae ôf}, where of = Gi, — Dia ao where G;;,, is the 
(i, i)th element of &,. The communalities are estimated by 


The error matrix caused by approximation is 
= = (BB. + D). 


Ideally, we would like this matrix to be close to zero. It can be shown that the 
sum of squared elements of S, — (BB. + D) is less than or equal to I i Hi ies alls 
i Therefore, the approximation error is bounded by the sum of squares of the 
neglected eigenvalues. 

From the solution in Eq. (9.19), the estimated factor loadings based on the 
principal component method do not change as the number of common factors m 
is increased. 


Maximum-Likelihood Method 
If the common factors f, and the specific factors €, are jointly normal, then r, 
is multivariate normal with mean mw and covariance matrix £, = BB’ + D. The 
maximum-likelihood method can then be used to obtain estimates of B and D under 
the constraint B’D~'B = A, which is a diagonal matrix. Here u is estimated by 
the sample mean. For more details of this method, readers are referred to Johnson 
and Wichern (2007). 

In using the maximum-likelihood method, the number of common factors must 
be given a priori. In practice, one can use a modified likelihood ratio test to check 
the adequacy of a fitted m-factor model. The test statistic is 


LR(m) = —[T —1—4(2k + 5) — 3m] (in \E,| —InjBf + DI) (0.20) 
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which, under the null hypothesis of m factors, is asymptotically distributed as a 
chi-squared distribution with sl(k — m)? — k — m] degrees of freedom. We discuss 
some methods for selecting m in Section 9.6.1. 


9.5.2 Factor Rotation 


As mentioned before, for any m x m orthogonal matrix P, 
r,-h=6f,+6=B f7 +e, 
where B* = BP and f* = P’ f,. In addition, 
BB’ + D=BPP'p' + D = B*(B*)' + D. 


This result indicates that the communalities and the specific variances remain 
unchanged under an orthogonal transformation. It is then reasonable to find an 
orthogonal matrix P to transform the factor model so that the common factors 
have nice interpretations. Such a transformation is equivalent to rotating the com- 
mon factors in the m-dimensional space. In fact, there are infinite possible factor 
rotations available. Kaiser (1958) proposes a varimax criterion to select the rotation 
that works well in many applications. Denote the rotated matrix of factor loadings 
by B* = [65] and the ith communality by çr. Define Bi = Bi /c; to be the rotated 
coefficients scaled by the (positive) square root of communalities. The varimax 
procedure selects the orthogonal matrix P that maximizes the quantity 


m 


k k R 
verd Sept; (Ai 


j=l | i=l i=l 


This complicated expression has a simple interpretation. Maximizing V corresponds 
to spreading out the squares of the loadings on each factor as much as possible. 
Consequently, the procedure is to find groups of large and negligible coefficients 
in any column of the rotated matrix of factor loadings. In a real application, factor 
rotation is used to aid the interpretations of common factors. It may be helpful in 
some applications, but not informative in others. There are many criteria available 
for factor rotation. 


9.5.3 Applications 


Given the data {r;} of asset returns, the statistical factor analysis enables us to 
search for common factors that explain the variabilities of the returns. Since factor 
analysis assumes no serial correlations in the data, one should check the validity of 
this assumption before using factor analysis. The multivariate portmanteau statistics 
can be used for this purpose. If serial correlations are found, one can build a 
VARMA model to remove the dynamic dependence in the data and apply the factor 
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analysis to the residual series. For many returns series, the correlation matrix of 
the residuals of a linear model is often very close to the correlation matrix of the 
original data. In this case, the effect of dynamic dependence on factor analysis is 
negligible. 

We consider three examples in this section. The first and third examples use the 
R or S-Plus to perform the analysis and the second example uses Minitab. Other 
packages can also be used. 


Example 9.2. Consider again the monthly log stock returns of IBM, Hewlett- 
Parkard, Intel, J.P. Morgan Chase, and Bank of America used in Example 9.1. 
To check the assumption of no serial correlations, we compute the portmanteau 
statistics and obtain Q5(1) = 39.99, Q5(5) = 160.60, and Q5(10) = 293.04. Com- 
pared with chi-squared distributions with 25, 125, and 250 degrees of freedom, the 
p values of these test statistics are 0.029, 0.017, and 0.032, respectively. Therefore, 
there exists some minor serial dependence in the returns, but the dependence is not 
significant at the 1% level. For simplicity, we ignore the serial dependence in factor 
analysis. 


Table 9.4 shows the results of factor analysis based on the correlation matrix 
using the maximum-likelihood method. We assume that the number of common 
factors is 2, which is reasonable according to the principal component analysis of 
Example 9.1. From the table, the factor analysis reveals several interesting findings: 


e The two factors identified by the maximum-likelihood method explain about 
60% of the variability of the stock returns. 

e Based on the rotated factor loadings, the two common factors have some 
meaningful interpretations. The technology stocks (IBM, Hewlett-Packard, 


TABLE 9.4 Factor Analysis of Monthly Log Stock Returns of IBM, 
Hewlett-Packard, Intel, J.P. Morgan Chase, and Bank of America“ 


Estimates of Rotated 
Factor Loadings Factor Loadings Communalities 
Variable fi h fi I 1—0? 
Maximum-Likelihood Method 
IBM 0.327 0.530 0.593 0.189 0.387 
HPQ 0.348 0.669 0.733 0.177 0.568 
INTC 0.337 0.647 0.709 0.171 0.531 
JPM 0.734 0.186 0.358 0.667 0.573 
BAC 0.960 —0.111 0.124 0.958 0.934 
Variance 1.801 1:193 1.535 1.459 2.994 
Proportion 0.360 0.239 0.307 0.292 0.599 


“The returns include dividends and are from January 1990 to December 2008. The analysis is based on 
the sample cross-correlation matrix and assumes two common factors. 
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and Intel) load heavily on the first factor, whereas the financial stocks (J.P. 
Morgan Chase and Bank of America) load highly on the second factor. These 
two rotated factors jointly differentiate the industrial sectors. 

e In this particular instance, the varimax rotation seems to alter the ordering of 
the two common factors. 

e The specific variance of IBM stock returns is relatively large, indicating that 
the stock has its own features that are worth further investigation. 


Example 9.3. In this example, we consider the monthly log returns of U.S. 
bond indexes with maturities in 30 years, 20 years, 10 years, 5 years, and 1 year. 
The data are described in Example 8.2 but have been transformed into log returns. 
There are 696 observations. As shown in Example 8.2, there is serial dependence 
in the data. However, removing serial dependence by fitting a VARMA(2,1) model 
has hardly any effects on the concurrent correlation matrix. As a matter of fact, the 
correlation matrices before and after fitting a VARMA(2,1) model are 


1.0 
0.98 1.0 

P, =| 0.92 091 1.0 , 
0.85 0.86 0.90 10 
0.63 0.64 067 O81 10 


1.0 

0.98 1.0 

0.92 0.92 1.0 ; 
0.85 0.86 0.90 1.0 

0.66 0.67 0.71 0.84 1.0 


d) 
ll 


where p, is the correlation matrix of the original log returns. Therefore, we apply 
factor analysis directly to the return series. 

Table 9.5 shows the results of statistical factor analysis of the data. For both 
estimation methods, the first two common factors explain more than 90% of the 
total variability of the data. Indeed, the high communalities indicate that the specific 
variances are very small for the five bond index returns. Because the results of the 
two methods are close, we only discuss that of the principal component method. 
The unrotated factor loadings indicate that (a) all five return series load roughly 
equally on the first factor, and (b) the loadings on the second factor are positively 
correlated with the time to maturity. Therefore, the first common factor represents 
the general U.S. bond returns, and the second factor shows the “time-to-maturity” 
effect. Furthermore, the loadings of the second factor sum approximately to zero. 
Therefore, this common factor can also be interpreted as the contrast between 
long-term and short-term bonds. Here a long-term bond means one with maturity 
10 years or longer. For the rotated factors, the loadings are also interesting. The 
loadings for the first rotated factor are proportional to the time to maturity, whereas 
the loadings of the second factor are inversely proportional to the time to maturity. 
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TABLE 9.5 Factor Analysis of Monthly Log Returns of U.S. Bond Indexes with 
Maturities in 30 Years, 20 Years, 10 Years, 5 Years, and 1 Year® 


Estimates of Rotated 
Factor Loadings Factor Loadings Communalities 
Variable fi h fi A 1—0? 
Principal Component Method 
30 years 0.952 0.253 0.927 0.333 0.970 
20 years 0.954 0.240 0.922 0.345 0.968 
10 years 0.956 0.140 0.866 0.429 0.934 
5 years 0.955 —0.142 0.704 0.660 0.931 
1 year 0.800 —0.585 0:325 0.936 0.982 
Variance 4.281 0.504 3.059 1.726 4.785 
Proportion 0.856 0.101 0.612 0.345 0.957 


Maximum-Likelihood Method 


30 years 0.849 —0.513 0.895 0.430 0.985 
20 years 0.857 —0.486 0.876 0.451 0.970 
10 years 0.896 —0.303 0.744 0.584 0.895 
5 years 1.000 0.000 0.547 0.837 1.000 
1 year 0.813 0.123 0.342 0.747 0.675 
Variance 3.918 0.607 2.538 1.987 4.525 
Proportion 0.784 0.121 0.508 0.397 0.905 


“The data are from January 1942 to December 1999. The analysis is based on the sample cross- 
correlation matrix and assumes two common factors. 


Example 9.4. Again, consider the monthly excess returns of the 10 stocks 
in Table 9.2. The sample span is from January 1990 to December 2003 and the 
returns are in percentages. Our goal here is to demonstrate the use of statistical 
factor models using the R or S-Plus command factanal. We started with a two- 
factor model, but it is rejected by the likelihood ratio test of Eq. (9.20). The test 
Statistic is LR(2) = 72.96. Based on the asymptotic ree distribution, p value of 
the test statistic is close to zero. 


> rtn=read.table(‘‘m-barra-9003.txt’’,header=T) 
> stat.fac=factanal (rtn, factors=2,method='mle’ ) 
> stat.fac 
Sums of squares of loadings: 

Factorl Factor2 

2.696479 2.19149 


Component names: 
"loadings" "uniquenesses" "correlation" "criteria" 
"factors" "dof" "method" "center" "scale" "n.obs" 
"scores" "call" 
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We then applied a three-factor model that appears to be reasonable at the 5% 
significance level. The p value of the LR(3) statistic is 0.0892. 


> stat.fac=factanal (rtn, factor=3,method='‘mle’ ) 

> stat.fac 

Test of the hypothesis that 3 factors are sufficient 
versus the alternative that more are required: 

The chi square statistic is 26.48 on 18 degrees of freedom. 
The p-value is 0.0892 


> summary (stat. fac) 
Importance of factors: 
Factor1 Factor2 Factor3 


SS loadings 2.635 1.825 1.326 
Proportion Var 0.264 0.183 0.133 
Cumulative Var 0.264 0.446 0.579 
Uniquenesses: 

AGE E MWD MER DELL HPQ IBM 

0.479 0.341 0.201 0.216 0.690 0.346 0.638 

AA CAT PG 


0.417 0.000 0.885 


Loadings: 
Factorl Factor2 Factor3 
AGE 0.678 0.217 0.121 


C 0.739 0.259 0.213 
MWD 0.817 0.356 

MER 0.819 0.329 

DELL 0.102 0.547 

HPQ 0.230 0.771 

IBM 0.200 0.515 0.238 
AA 0.194 0.546 0.497 
CAT 0.198 0.138 0.970 
PG 0.331 


The factor loadings can also be shown graphically using 


> plot (loadings (stat.fac) ) 


and the plots are in Figure 9.6. From the plots, factor 1 represents essentially the 
financial service sector, and factor 2 mainly consists of the excess returns from the 
high-tech stocks and the Alcoa stock. Factor 3 depends heavily on excess returns 
of CAT and AA stocks and, hence, represents the remaining industrial sector. 

Factor rotation can be obtained using the command rotate, which allows for 
many rotation methods, and factor realizations are available from the command 
predict. 
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Factor1 


0.0 0.2 0.4 0.6 0.8 


MER MWD Cc AGE PG HPQ 
Factor2 


0.0 0.2 0.4 0.6 0.8 


HPQ DELL AA IBM MWD MER 
Factor3 


0.00.20.40.60.81.0 


CAT AA IBM Cc AGE HPQ 


Figure 9.6 Plots of factor loadings when a 3-factor statistical factor model is fitted to 10 monthly 
excess stock returns in Table 9.2. 


> stat.fac2 = rotate(stat.fac,rotation=’quartimax’ ) 
> loadings (stat.fac2) 
Factorl Factor2 Factor3 


AGE 0.700 0.171 

C 0.772 0.216 0.124 
MWD 0.844 0.291 

MER 0.844 0.264 

DELL 0.144 0.536 

HPQ 0.294 0.753 

IBM 0.258 0.518 0.164 
AA 0.278 0.575 0.418 
CAT 0.293 0.219 0.931 
PG 0.334 


> factor.real=predict (stat.fac, type=’weighted.1s’) 


Finally, we obtained the correlation matrix of the 10 excess returns based on the 
fitted three-factor statistical factor model. As expected, the correlations are closer to 
their sample counterparts than those of the industrial factor model in Section 9.3.1. 
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One can also use GMVP to compare the covariance matrices of the returns and the 
statistical factor model. 


> corr.fit=fitted(stat.fac) 
> print (corr.fit,digits=1,width=2) 


AGE C MWD MER DELL HPQ IBM AA CAT PG 
AGE 1.0 0.6 0.6 0.6 0.19 0.3 0.3 0.3 0.3 0.2 
C 0.6 1.0 0.7 0.7 0.22 0.4 0.3 0.4 0.4 0.3 
MWD 0.6 0.7 1.0 0.8 0.28 0.5 0.4 0.4 0.3 0.3 
MER 0.6 0.7 0.8 1.0 0.26 0.5 0.4 0.4 0.3 0.3 
DELL 042 0-2 0-3 0:3 1:00 0-5 0.3 0.3 041 0.0 
HPQ 0.3 0.4 0.5 0.4 0.45 1.0 0.5 0.5 0.2 0.1 
IBM 0.3 0.3 0.4 0.3 0.31 0.5 1.00.4 0.3 0.1 
AA 0.3 0.4 0.4 0.4 0.33 0.5 0.4 1.0 0.6 0.1 
CAT 0.3 0.4 0.3 0.3 0.11 0.2 0.3 0.61.0 0.1 
PG O12 0.3 043 0.3 0703 0.2 0.2 0-1 01 2.0 


9.6 ASYMPTOTIC PRINCIPAL COMPONENT ANALYSIS 


So far, our discussion of PCA assumes that the number of assets is smaller than the 
number of time periods, that is, k < T. To deal with situations of a small T and 
large k, Conner and Korajczyk (1986, 1988) developed the concept of asymptotic 
principal component analysis (APCA), which is similar to the traditional PCA but 
relies on the asymptotic results as the number of assets k increases to infinity. Thus, 
the APCA is based on eigenvalue—eigenvector analysis of the T x T matrix 


a~ 


1 
Gy = T(R — 1rF')(R — 1rF')', 


where 17 is the T-dimensional vector of ones and F = (F1,..., F) with F; = 
(147 R;)/T being the sample mean of the ith return series. Conner and Korajczyk 
(1988) showed that as k —> oo eigenvalue—eigenvector analysis of Qr is equivalent 
to the traditional statistical factor analysis. In other words, the APCA estimates of 
the factors f, are the first m eigenvectors of Qr. Let F ı be the m x T matrix 
consisting of the first m eigenvectors of 7. Then F , is the rth column of F t- 
Using an idea similar to the estimation of BARRA factor models, Connor and 
Korajczyk (1988) propose refining the estimation of F , as follows: 


1. Use the sample covariance matrix Qr to obtain an initial estimate of f, for 
=D eres ae 
2. For each asset, perform the OLS estimation of the model 


Tit = Qi + Bi fi + it, ESen 


where B; = (Bii, .--, Bim) and compute the residual variance Ge 
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3. Form the diagonal matrix D= diag{6?, re ôf} and rescale the returns as 
R= RD”. 


4. Compute the T x T covariance matrix using R, as 
Oo l = PANI 
Q, = 7 (Re — Irr,)(R, — Arr), 


where r., is the k-dimensional vector of the column means of R., and perform 
eigenvalue—eigenvector analysis of @, to obtain a refined estimate of f,. 


9.6.1 Selecting the Number of Factors 


Two methods are available in the literature to help select the number of factors 
in factor analysis. The first method proposed by Connor and Korajczyk (1993) 
makes use of the idea that if m is the proper number of common factors, then 
there should be no significant decrease in the cross-sectional variance of the asset 
specific error €; when the number of factors moves from m to m + 1. The second 
method proposed by Bai and Ng (2002) adopts some information criteria to select 
the number of factors. This latter method is based on the observation that the 
eigenvalue—eigenvector analysis of &7 solves the least-squares problem 


Assume that there are m factors so that f, is m-dimensional. Let ô? m) be the 
residual variance of the inner regression of the prior least-squares problem for 
asset i. This is done by using f, obtained from the APCA analysis. Define the 
cross-sectional average of the residual variances as 


k 
ô? (m) = I 2 ô? (m). 


The criteria proposed by Bai and Ng (2002) are 


k+T kT 
Cpi(m) = 6°(m) + mê?’ (M) (=) In (<5) , 


Cp2(m) = ĉ° (m) + mô? (M) (=) In( Per), 


where M is a prespecified positive integer denoting the maximum number of factors 
and Pkr = min(Jk, ~ T). One selects m that minimizes either C,1(m) or Cp2(m) 
for 0 < m < M. In practice, the two criteria may select different numbers of factors. 
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9.6.2 An Example 


To demonstrate asymptotic principal component analysis, we consider monthly sim- 
ple returns of 40 stocks from January 2001 to December 2003 for 36 observations. 
Thus, we have k = 40 and T = 36. The tick symbols of stocks used are given in 
Table 9.6. These stocks are among those heavily traded on NASDAQ and the NYSE 
on a particular day of September 2004. The main S-Plus command used is mfactor. 

To select the number of factors, we used the two methods discussed earlier. 
The Connor—Korajczyk method selects m = 1, whereas the Bai-Ng method uses 
m = 6. For the latter method, the two criteria provide different results. 


> dim(rtn) % rtn is the return data. 


[1] 36 40 

> nf.ck=mfactor (rtn,k='’ck’,max.k=10,sig=0.05) 

> nf.ck 

Call: 

mfactor(x = rtn, k = "ck", max.k = 10, sig = 0.05) 


Factor Model: 
Factors Variables Periods 
1 40 36 
Factor Loadings: 
Min. 1st Qu. Median Mean 3rd Qu. Max. 
Fal. 0.069 0.432 0.629 0.688 1.071 1.612 


Regression R-squared: 
Min. 1st Qu. Median Mean 3rd Qu. Max. 

0.090 0.287 0.487 0.456 0.574 0.831 

> nf.bn=mfactor (rtn,k=’bn’ ,max.k=10,sig=0.05) 

Warning messages: 

Cpl and Cp2 did not yield same result. The smaller one 
is used. 

> nf.bnsk 

[1] 6 


TABLE 9.6 Tick Symbols of Stocks Used in Asymptotic Principal Component 
Analysis for Sample Period from January 2001 to December 2003 


Market Tick Symbol 

NASDAQ INTC MSFT SUNW CSCO AMAT 
ORCL SIRI COCO CORV SUPG 
YHOO JDSU QCOM CIEN DELL 
ERTS EBAY ADCT AAPL JNPR 

NYSE LU PFE NT BAC BSX 
GE TXN XOM FRX Q 
F TWX Cc MOT JPM 


TYC HPQ NOK WMT AMD 
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Using m = 6, we apply APCA to the returns. The scree plot and estimated factor 
returns can also be obtained. 


> apca = mfactor(rtn,k=6) 
> apca 

Call: 

mfactor(x = rtn, k = 6) 
Factor Model: 

Factors Variables Periods 


6 40 36 
Factor Loadings: 

Min ist Qu. Median Mean 3rd Qu. Max. 
0.048 0.349 0.561 0.643 0.952 2.222 
-1.737 0.084 0.216 0.214 0.323 1.046 
-1.512 0.002 0.076 0.102 0.255 1.093 
-0.965 -0.035 0.078 0.048 0.202 0.585 
-0.722 -0.008 0.056 0.066 0.214 0.729 
-0.840 -0.088 0.003 0.003 0.071 0.635 


Way Ay yy 


oO) 


Regression R-squared: 


Min. 1st Qu. Median Mean 3rd Qu. Max. 


0.219 0.480 0.695 0.651 0.801 0.999 


> screeplot.mfactor (apca) 
> fplot (factors (apca) ) 


Figure 9.7 shows the scree plot of the APCA for the 40 stock returns. The 6 
common factors used explain about 89.4% of the variability. Figure 9.8 gives the 
time plots of the returns of the 6 estimated factors. 


EXERCISES 


9.1. 


9.2. 


Consider the monthly simple excess returns, in percentages and including 
dividends, of 13 stocks and the S&P 500 composite index from January 1990 
to December 2008. The monthly 3-month Treasury bill rate in the secondary 
market is used as the risk-free interest rate to compute the excess returns. The 
tick symbols for the stocks are AA, AXP, CAT, DE, F, FDX, HPQ, IBM, JNJ, 
KMB, MMM, PG, and WFC. The data are in the file m-fac-ex-9008.txt. 
Perform the market model analysis of Section 9.2.1 for the 13 stock returns 
to obtain the estimates of £;, of, and R? for each stock return series. 
Consider the monthly log stock returns, in percentages and including 
dividends, of Merck & Company, Johnson & Johnson, General Electric, 
General Motors, Ford Motor Company, and value-weighted index from 
January 1960 to December 2008; see the file m-mrk2vw.txt of Exercise 8.1 
of Chapter 8. 

(a) Perform a principal component analysis of the data using the sample 

covariance matrix. 
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Figure 9.7 Scree plot of asymptotic principal component analysis applied to monthly simple returns 
of 40 stocks. Sample period is from January 2001 to December 2003. 


9.3. 


9.4. 


9:5. 


(b) Perform a principal component analysis of the data using the sample cor- 
relation matrix. 


(c) Perform a statistical factor analysis on the data. Identify the number of 
common factors. Obtain estimates of factor loadings using both the prin- 
cipal component and maximum-likelihood methods. 

The file m-excess-cl0sp-9003.txt contains the monthly simple excess 

returns of 10 stocks and the S&P 500 index. The 3-month Treasury bill rate 

on the secondary market is used to compute the excess returns. The sample 

period is from January 1990 to December 2003 for 168 observations. The 11 

columns in the file contain the returns for ABT, LLY, MRK, PFE, F, GM, 

BP, CVX, RD, XOM, and SP5, respectively. Analyze the 10 stock excess 

returns using the single-factor market model. Plot the beta estimate and R? 

for each stock, and use the global minimum variance portfolio to compare the 

covariance matrices of the fitted model and the data. 


Again, consider the 10 stock returns in m-excess-c10sp-9003.txt. The 
stocks are from companies in 3 industrial sectors. ABT, LLY, MRK, and PFE 
are major drug companies, F and GM are automobile companies, and the rest 
are big oil companies. Analyze the excess returns using the BARRA industrial 
factor model. Plot the 3-factor realizations and comment on the adequacy of 
the fitted model. 


Again, consider the 10 excess stock returns in the file m-excess-cl0sp- 
9003.txt. Perform a principal component analysis on the returns and obtain 
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Figure 9.8 Time plots of factor returns derived from applying asymptotic principal component analysis 
to monthly simple returns of 40 stocks. Sample period is from January 2001 to December 2003. 


the scree plot. How many common factors are there? Why? Interpret the 
common factors. 

9.6. Again, consider the 10 excess stock returns in the file m-excess-cl0sp- 
9003.txt. Perform a statistical factor analysis. How many common factors 
are there if the 5% significance level is used? Plot the estimated factor loadings 
of the fitted model. Are the common factors meaningful? 

9.7. The file m-fedip.txt contains year, month, effective federal funds rate, 
and the industrial production index from July 1954 to December 2003. The 
industrial production index is seasonally adjusted. Use the federal funds rate 
and the industrial production index as the macroeconomic variables. Fit a 
macroeconomic factor model to the 10 excess returns in m-excess-cl10sp- 
9003.txt. You can use a VAR model to obtain the surprise series of the 
macroeconomic variables. Comment on the fitted factor model. 
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CHAPTER 10 


Multivariate Volatility Models 
and Their Applications 


In this chapter, we generalize the univariate volatility models of Chapter 3 to 
the multivariate case and discuss some simple methods for modeling the dynamic 
relationships between volatility processes of multiple asset returns. By multivariate 
volatility, we mean the conditional covariance matrix of multiple asset returns. 
Multivariate volatilities have many important financial applications. They play an 
important role in portfolio selection and asset allocation, and they can be used to 
compute the value at risk of a financial position consisting of multiple assets. 

Consider a multivariate return series {r,}. We adopt the same approach as the 
univariate case by rewriting the series as 


re = Mitas, 


where u, = E(r,|F,—1) is the conditional expectation of r; given the past informa- 
tion F;—1, and a; = (air, +- , akt) is the shock, or innovation, of the series at time 
t. In addition, we assume that r; follows a multivariate time series model of Chapter 
8 so that w, is the 1-step-ahead prediction of the model. For most return series, it 
suffices to employ a simple vector ARMA structure with exogenous variables for 
f,— that is, 


P q 
hi = Yxi +} Diri — )_ Oian, (10.1) 


i=l i=1 


where x, denotes an m-dimensional vector of exogenous (or explanatory) variables 
with xj, = 1, Y is ak x m matrix, and p and q are nonnegative integers. We refer 
to Eq. (10.1) as the mean equation of r;. 
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The conditional covariance matrix of a; given F;_; is ak x k positive-definite 
matrix XZ, defined by £, = Cov(a;|F;—1). Multivariate volatility modeling is con- 
cerned with the time evolution of &;. We refer to a model for the {2;} process as 
a volatility model for the return series rz. 

There are many ways to generalize univariate volatility models to the multi- 
variate case, but the curse of dimensionality quickly becomes a major obstacle 
in applications because there are k(k + 1)/2 quantities in £, for a k-dimensional 
return series. To illustrate, there are 15 conditional variances and covariances in 
x, for a five-dimensional return series. The goal of this chapter is to introduce 
some relatively simple multivariate volatility models that are useful, yet remain 
manageable in real application. In particular, we discuss some models that allow 
for time-varying correlation coefficients between asset returns. Time-varying cor- 
relations are useful in finance. For example, they can be used to estimate the 
time-varying beta of the market model for a return series. 

We begin by using an exponentially weighted approach to estimate the covari- 
ance matrix in Section 10.1. This estimated covariance matrix can serve as a 
benchmark for multivariate volatility estimation. Section 10.2 discusses some gen- 
eralizations of univariate GARCH models that are available in the literature. We 
then introduce two methods to reparameterize XZ, for volatility modeling in Section 
10.3. The reparameterization based on the Cholesky decomposition is found to be 
useful. We study some volatility models for bivariate returns in Section 10.4, using 
the GARCH model as an example. In this particular case, the volatility model can 
be bivariate or three dimensional. Section 10.5 is concerned with volatility models 
for higher dimensional returns and Section 10.6 addresses the issue of dimension 
reduction. We demonstrate some applications of multivariate volatility models in 
Section 10.7. Finally, Section 10.8 gives a multivariate Student-t distribution useful 
for volatility modeling. 


10.1 EXPONENTIALLY WEIGHTED ESTIMATE 


Given the innovations F;_; = {a,,..., a;—1}, the (unconditional) covariance matrix 
of the innovation can be estimated by 


where it is understood that the mean of a; is zero. This estimate assigns equal 
weight 1/(t — 1) to each term in the summation. To allow for a time-varying 
covariance matrix and to emphasize that recent innovations are more relevant, one 
can use the idea of exponential smoothing and estimate the covariance matrix of 
a; by 


n= a ae (10.2) 
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where 0 < à < 1 and the weights (1 — aAA — àT!) sum to one. For a suf- 
ficiently large t such that A*=! ~ 0, the prior equation can be rewritten as 


>. =(d- aria, + AS pi. 


Therefore, the covariance estimate in Eq. (10.2) is referred to as the exponentially 
weighted moving-average (EWMA) estimate of the covariance matrix. 

Suppose that the return data are {r;,..., rr}. For a given A and initial estimate 
x1, XZ; can be computed recursively. If one assumes that a; = r; — m, follows a 
multivariate normal distribution with mean zero and covariance matrix £,, where 
H, is a function of parameter ©, then A and © can be estimated jointly by the 
maximum-likelihood method because the log-likelihood function of the data is 


T T 
1 1 a 
In L(@,A)x-5) B52 (ri — pr — hi), 
t=1 


t=1 
which can be evaluated recursively by substituting È, for &;. 


Example 10.1. To illustrate, consider the daily log returns of the Hang Seng 
index of Hong Kong and the Nikkei 225 index of Japan from January 4, 2006, to 
December 30, 2008, for 713 observations. The indexes were obtained from Yahoo 
Finance. For simplicity, we only employ data when both markets were open to 
calculate the log returns, which are in percentages. Figure 10.1 shows the time 
plots of the two index returns. The effect of recent global financial crisis is clearly 
seen from the plots. Let rj; and rz; be the log returns of the Hong Kong and 
Japanese markets, respectively. If univariate GARCH models are entertained, we 
obtain the models 


rit = 0.109 + arr, ay = O11, 
oj}, = 0.038 + 0.14347; + 0.855071, (10.3) 
ro = 0.003 + az, A = Ox E, 
o3, = 0.044 + 0.127a5,,_, + 0.86loz,_1, (10.4) 


where all of the parameter estimates are significant at the 5% level except for 
the constant term of the mean equation for the Nikkei 225 index returns. The 
Ljung—Box statistics of the standardized residuals and their squared series of the 
two univariate models fail to indicate any model inadequacy. The two volatility 
equations are close to an IGARCH(1,1) model. This is reasonable because of the 
increased volatility caused by the subprime financial crisis. Figure 10.2 shows 
the estimated volatilities of the two univariate GARCH(1,1) models. Indeed, the 
volatility series confirm that both markets were more volatile than usual in 2008. 

Turn to bivariate modeling. We apply the EWMA approach to obtain volatility 
estimates, using the command mgarch in S-Plus FinMetrics: 
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Figure 10.1 Time plots of daily log returns in percentages of stock market indexes for Hong Kong and 
Japan from January 4, 2006, to December 30, 2008: (a) Hong Kong market and (b) Japanese market. 
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Figure 10.2 Estimated volatilities (standard error) for daily log returns in percentages of stock market 
indexes for Hong Kong and Japan from January 4, 2006, to December 30, 2008: (a) Hong Kong market 
and (b) Japanese market. Univariate models are used. 
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> m3=mgarch(formula.mean=~arma(0,0),formula.var=~ewmal, 
series=rtn,trace=F) 
> summary (m3) 


Call: 

mgarch(formula.mean =~arma (0,0), formula.var=~ewmal, 
series=rtn,trace = F) 

Mean Equation: structure(.Data = ~arma (0,0), class="formula") 


Conditional Var. Eq.: structure(.Data=~ewmal,class="formula") 


Conditional Distribution: gaussian 


Value Std.Error t value Pr(>|t]) 
C(1) 0.082425 0.030900 2.6675 0.007816 
C(2) -0.006849 0.030093 -0.2276 0.820020 
ALPHA 0.069492 0.004945 14.0517 0.000000 


The estimate of A is 1 — & = 1 — 0.0695 ~ 0.9305, which is in the typical range 
commonly seen in practice. Figure 10.3 shows the estimated volatility series by the 
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Figure 10.3 Estimated volatilities (standard error) for daily log returns in percentages of stock market 
indices for Hong Kong and Japan from January 4, 2006, to December 30, 2008: (a) Hong Kong market 
and (b) Japanese market. Exponentially weighted moving-average approach is used. 
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EWMA approach. Compared with those in Figure 10.2, the EWMA approach pro- 
duces smoother volatility series, even though the two plots show similar volatility 
patterns. 


10.2 SOME MULTIVARIATE GARCH MODELS 


Many authors have generalized univariate volatility models to the multivariate case. 
In this section, we discuss some of the generalizations. For more details, readers 
are referred to the survey article of Bauwens, Laurent, and Rombouts (2004). 


10.2.1 Diagonal Vectorization (VEC) Model 


Bollerslev, Engle, and Wooldridge (1988) generalize the exponentially weighted 
moving-average approach to propose the model 


Z, = Ag+ AiO (ania, ;) +) Bj © Enj, (10.5) 
i=1 j=1 


where m and s are nonnegative integers, A; and B j are symmetric matrices, and © 
denotes the Hadamard product, that is, element-by-element multiplication. This is 
referred to as the diagonal VEC(m, s) model or DVEC(m, s) model. To appreciate 
the model, consider the bivariate DVEC(1,1) case satisfying 


O11, _| Auto 
O21,1 922.1 A2,0 A220 
Ait. s-i 
+| 4 oj $ 
| Azn, A221 | | A1 t-10- AZ 44 
Bi 011,1 
+| Ba, B21 Jef O21,t-1  922,r-1 IF 
where only the lower triangular part of the model is given. Specifically, the model is 


2 
Os =Ano+ A111411 + Buses, 
O21, = Arto + Art141,2-142,7-1 + Boi,1021,1-1, 


2 
on, = An, + A22,143 1 + Br2,1022,1-1, 


where each element of £, depends only on its own past value and the corresponding 
product term in aria, That is, each element of a DVEC model follows a 
GARCH(1,1)-type model. The model is, therefore, simple. However, it may not 
produce a positive-definite covariance matrix. Furthermore, the model does not 
allow for dynamic dependence between volatility series. 
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Figure 10.4 Time plot of monthly simple returns, including dividends, for Pfizer and Merck stocks 
from January 1965 to December 2008: (a) Pfizer stock and (b) Merck stock. 


Example 10.2. For illustration, consider the monthly simple returns, including 
dividends, of two U.S. major drug companies from January 1965 to December 2008 
for 528 observations. Let rı; and rọ be the monthly returns of Pfizer and Merck 
stock, respectively. The bivariate return series r; = (rir, 2;)’, shown in Figure 10.4, 
has no significant serial correlations with Q(10) being 10.48(0.40) and 11.42(0.33), 
respectively, for the two series. Therefore, the mean equation of r, consists of a 
constant term only. We fit a DVEC(1,1) model to the series using the command 
mgarch in FinMetrics of S-Plus: 


> rtn=cbind(pfe,mrk) % Output edited. 

> mdvec=mgarch(rtn~1,~dvec(1,1) ) 

> summary (mdvec) 

Call: 

mgarch(formula.mean=rtn ~ 1, formula.var= ~ dvec(1, 1)) 

Mean Equation: structure(.Data =rtn ~ 1, class="formula") 

Conditional Var. Eq.: structure(.Data=~dvec(1,1), 
class="formula") 

Conditional Distribution: gaussian 


Value Std.Error t value Pr(>|t]) 
C(1) 1.350e-02 3.149e-03 4.285 2.174e-05 
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C(2) 1.313e-02 3.043e-03 4.314 1.921e-05 

A(1, 1) 7.544e-04 3.939e-04 1.916 5.597e-02 

A(2, 1) 7.543e-05 3.468e-05 2.175 3.010e-02 

A(2, 2) 7.941e-05 3.871e-05 2.051 4.072e-02 
ARCH(1; 1, 1) 7.078e-02 2.757e-02 2.568 1.051e-02 
ARCH(1; 2, 1) 2.513e-02 8.492e-03 2.960 3.220e-03 
ARCH(1; 2, 2) 4.095e-02 1.213e-02 3.375 7.939e-04 
GARCH(1; 1, 1) 7.858e-01 9.055e-02 8.677 0.000e+00 
GARCH(1; 2, 1) 9.499e-01 1.671e-02 56.831 0.000e+00 
GARCH(1; 2, 2) 9.454e-01 1.469e-02 64.358 0.000e+00 


Statistic P-value Chi*2-d.f. 
pfe 9.531 0.6570 T2 
mrk 12.349 0.4181 12 


Ljung-Box test for squared standardized residuals: 


Statistic P-value Chi*2-d.f. 


pfe 22.077 0.03666 12 

mrk 6.437 0.89246 12 

> names (mdvec) 
[1] "residuals" "sigma.t" "df.residual" "coef" 
[5] "model" "cond.dist" "likelihood" "opt.index" 
[9] "cov" "std.residuals" "R.t" mG 

[13] "prediction" "call" "series" 


From the output, all parameter estimates, but A(1,1), are significant at the 5% level, 
and the fitted volatility model is 


o11,1 = 0.00075 + 0.07 1az ,_, + 0.786011 ,1-1, 
O11 = 0.00008 + 0.02541 +—~142,1-1 + 0.950021 1-1, 
22,1 = 0.00008 + 0.04145 ,_; + 0.945029 1-1. 


The output also provides some model checking statistics for individual stock 
returns. For instance, the Ljung—Box statistics for the standardized residual 
series and its squared series of Pfizer stock returns give Q(12) = 9.53(0.66) and 
Q(12) = 12.35(0.42), respectively, where the number in parentheses denotes 
the p value. Thus, checking the fitted model individually, one cannot reject the 
DVEC(1,1) model. A more informative model-checking approach is to apply 
the multivarite Q statistics to the bivariate standardized residual series and its 
squared process. Details are omitted. Interested readers are referred to Li (2004). 
Figure 10.5 shows the fitted volatility and correlation series. These series are 
stored in “sigma.t’ and “R.t’, respectively. The correlations range from 0.37 
to 0.83. 
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Figure 10.5 Estimated volatilities (standard error) and time-varying correlations of DVEC(1,1) model 
for monthly simple returns of two major drug companies from January 1965 to December 2008: (a) 
Pfizer stock volatility, (b) Merck stock volatility, and (c) time-varying correlations. 


10.2.2 BEKK Model 


To guarantee the positive-definite constraint, Engle and Kroner (1995) propose the 
Baba-Engle-Kraft-Kroner (BEKK) model, 


Z, = AA’ +) Aj(a;-ia)_;)A} +Y BjE,—; Bi, (10.6) 
i=l j=l 


where A is a lower triangular matrix and A; and B; are k x k matrices. Based 
on the symmetric parameterization of the model, X, is almost surely positive def- 
inite provided that AA’ is positive definite. This model also allows for dynamic 
dependence between the volatility series. On the other hand, the model has several 
disadvantages. First, the parameters in A; and B ; do not have direct interpretations 
concerning lagged values of volatilities or shocks. Second, the number of param- 
eters employed is k?(m + s) + k(k + 1)/2, which increases rapidly with m and s. 
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Limited experience shows that many of the estimated parameters are statistically 
insignificant, introducing additional complications in modeling. 


Example 10.3. To illustrate, we consider the monthly simple returns of Pfizer 
and Merck stocks of Example 10.2 and employ a BEKK(1,1) model. Again, S-Plus 
is used to perform the estimation: 


> mbekk=mgarch(rtn~1,~bekk(1,1) ) 
> summary (mbekk) 


Call: 
mgarch(formula.mean = rtn ~ 1, formula.var = ~ bekk(1, 1)) 
Mean Equation: structure(.Data = rtn ~ 1, class = "formula") 


Conditional Var. Eq.: structure(.Data=~bekk(1,1), 
class="formula") 
Conditional Distribution: gaussian 


Value Std.Error t value Pr(>|t|) 

C(1) 1.329e-02 0.003247 4.094e+00 4.907e-05 

C(2) 1.269e-02 0.003095 4.100e+00 4.792e-05 

A(1, 1) 2.505e-02 0.008382 2.988e+00 2.938e-03 

A(2, 1) 1.349e-02 0.004979 2.710e+00 6.946e-03 

A(2, 2) 3.272e-06 8.453262 3.870e-07 1.000e+00 
ARCH(1; 1, 1) 2.129e-01 0.084340 2.524e+00 1.190e-02 
ARCH(1; 2, 1) 9.963e-02 0.072156 1.381e+00 1.680e-01 
ARCH(1; 1, 2) 6.336e-02 0.076065 8.330e-01 4.052e-01 
ARCH(1; 2, 2) 1.824e-01 0.062133 2.936e+00 3.467e-03 
GARCH(1; 1, 1) 9.090e-01 0.063239 1.437e+01 0.000e+00 
GARCH (1; 2, 1) -5.888e-02 0.047766 -1.233e+00 2.182e-01 
GARCH (1; 1, 2) -8.231e-03 0.031512 -2.612e-01 7.940e-01 
GARCH (1; 2, 2) 9.824e-01 0.022587 4.349e+01 0.000e+00 


Statistic P-value Chi^2-d.f. 
pfe 9.465 0.6628 12 
mrk 11.591 0.4791 12 


Ljung-Box test for squared standardized residuals: 


Statistic P-value Chi*2-d.f. 
pfe 21455 (0.04291 12 
mrk 9.19 0.68664 12 


Model-checking statistics based on the individual residual series and provided 
by S-Plus fail to suggest any model inadequacy of the fitted BEKK(1,1) model. 
Figure 10.6 shows the fitted volatilities and the time-varying correlations of the 
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Figure 10.6 Estimated volatilities (standard error) and time-varying correlations of BEKK(1,1) model 
for monthly simple returns of two major drug companies from January 1965 to December 2008: (a) 
Pfizer stock volatility, (b) Merck stock volatility, and (c) time-varying correlations. 


BEKK(1,1) model. Compared with Figure 10.5, there are some differences between 
the two fitted volatility models. For instance, the time-varying correlations of the 
BEKK(1,1) model appear to be more volatile. 

The volatility equation of the fitted BEKK(1,1) model is 


Oll, O12,¢ e 0.025 0 0.025 0.013 
on. O27 | | 0.013 3x10% 0 3 x 1076 
0.213 0.063 
+| 


0.100 a 
| 0.213 0.100 | 


at 1 Mhidi] 
a2,t—141,t—1 a) t1 


| 0.901 o] 


0.063 0.182 —0.059 0.982 


| 0.901 Ll 


O11,t—1 O12,r-1 


O21,t—-1 922,t-1 —0.008 0.982 


i—i + m1 
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where three estimates are insignificant at the 5% level. In general, the BEKK model 
tends to contain some insignificant parameter estimates, and one needs to perform 
matrix multiplication to decipher the fitted model. 


10.3 REPARAMETERIZATION 


A useful step in multivariate volatility modeling is to reparameterize XZ, by making 
use of its symmetric property. We consider two reparameterizations. 


10.3.1 Use of Correlations 


The first reparameterization of XZ; is to use the conditional correlation coefficients 
and variances of a,. Specifically, we write X, as 


2 = [oij] = D, p,D;, (10.7) 


where p, is the conditional correlation matrix of a,, and D; is a k x k diagonal 
matrix consisting of the conditional standard deviations of elements of a, (i.e., 


D, = diag{. /O11 1, - - -> /Okk.t})- 


Because p, is symmetric with unit diagonal elements, the time evolution of £, 
is governed by that of the conditional variances o;;,, and the elements pij, of p,, 
where j <i and 1 <i < k. Therefore, to model the volatility of a+, it suffices to 
consider the conditional variances and correlation coefficients of a;r. Define the 
k(k + 1)/2-dimensional vector 


Er = (C1, -< <» Okk,ts Qh) » (10.8) 
where @, is a k(k — 1)/2-dimensional vector obtained by stacking columns of the 


correlation matrix p,, but using only elements below the main diagonal. Specifi- 
cally, for a k-dimensional return series, 


0, = (Pais oss Pi 04 os +s Pkt l> leet) 
To illustrate, for k = 2, we have 0, = 21,1 and 
E; = (O11, 022,1, 021,21)’, (10.9) 


which is a three-dimensional vector, and for k = 3, we have o, = (21,1, (31,1, 
32,1)’ and 


Ep = (011,21, 022,1, 033,15 21,t P31,t> 32,1) » (10.10) 


which is a six-dimensional random vector. 
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If a; is a bivariate normal random variable, then ©; is given in Eq. (10.9) and 
the conditional density function of a; given F;_ is 


T 1 O(a, > Mt, = ) 
J (Qt, Gal Ep) = ———————. exp = 2s , 
27,011,021 — 03; ,) T Pate 


where 


2 2 
a, i ay, 221,110 424 


O11,t 022,t J 911,t922,t 


O(a, ax, E+) = 


The log probability density function of a; relevant to the maximum-likelihood 
estimation is 


1 
L(ayt, dx, a) = -3 Info, t022, (1 — Pal 


T 1 a?, P! as, 2 p21,411 42 -a019 
O11,t 022,t a/011,t022,t 


This reparameterization is useful because it models covariances and correlations 
directly. Yet the approach has several weaknesses. First, the likelihood function 
becomes complicated when k > 3. Second, the approach requires a constrained 
maximization in estimation to ensure the positive definiteness of X,. The constraint 
becomes complicated when k is large. 


10.3.2 Cholesky Decomposition 


The second reparameterization of XZ; is to use the Cholesky decomposition; see 
Appendix A of Chapter 8. This approach has some advantages in estimation as it 
requires no parameter constraints for the positive definiteness of Z;; see Pourah- 
madi (1999). In addition, the reparameterization is an orthogonal transformation so 
that the resulting likelihood function is extremely simple. Details of the transfor- 
mation are given next. 

Because %; is positive definite, there exist a lower triangular matrix L, with 
unit diagonal elements and a diagonal matrix G; with positive diagonal elements 
such that 


E, = L,G,L'. (10.12) 


This is the well-known Cholesky decomposition of X;. A feature of the decompo- 
sition is that the lower off-diagonal elements of L; and the diagonal elements of G; 
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have nice interpretations. We demonstrate the decomposition by studying carefully 
the bivariate and three-dimensional cases. For the bivariate case, we have 


E, =| Wee Cae | ic 1 0 l G,=| Su 0 
Ot 922.1 gir 1 O 8ni 
where g;;,, > 0 for i = 1 and 2. Using Eq. (10.12), we have 
y= Olt O12,¢ | &11,t q21,t811,t 
i Ot Ont Qrr8ie 821 + q8 | 
Equating elements of the prior matrix equation, we obtain 


Olt = Bilis, O21,t = G21tZil,t; 022, = 82,1 + G31 ¢811,t- (10.13) 


Solving the prior equations, we have 


2 
O21,t Oii 
Silt = O11, Qt = f 82, = O21 — : (10.14) 
O11,t Olle 
However, consider the simple linear regression 
ax = Bay, + bx, (10.15) 


where b denotes the error term. From the well-known least-squares theory, we 
have 


_ Cov(air, ax) O21, 


p= = f 
Var(a1;) O11. 
2 
2 Oait 
Var(b2;) = Var(ax) — B°Var(air) = 022,4 — Sa 
Iie 


Furthermore, the error term bz is uncorrelated with the regressor air. Consequently, 
using Eq. (10.14), we obtain 


Silt = O11,ts qz, = P, 822,1 = Var(bz), bular, 


where L denotes no correlation. In summary, the Cholesky decomposition of the 
2 x 2 matrix Z, amounts to performing an orthogonal transformation from a, to 
b, = (bit, bx)’ such that 


bit = at and bor = ax — 421.1411, 
where q21,t = £ is obtained by the linear regression (10.15) and Cov(6;) is a diag- 


onal matrix with diagonal elements g;; t. The transformed quantities g21,; and gij.1 
can be interpreted as follows: 
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1. The first diagonal element of G, is simply the variance of ay. 


. The second diagonal element of G, is the residual variance of the simple 
linear regression in Eq. (10.15). 


. The element q21, of the lower triangular matrix L, is the coefficient £ of 
the regression in Eq. (10.15). 


The prior properties continue to hold for the higher dimensional case. For example, 
consider the three-dimensional case in which 


1 0 0 Sis 0 0 
L, = 921, 1 0 š G, = 0 822,t 0 
Bir 9324 1 0 O g3 


From the decomposition in Eq. (10.12), we have 


Oits O21,¢ O31,t 
O21,t O22,1 032 
O31,t 9321 933 


§11,t q21,t811,t q31,t811,t 
=] q2,t811t 51 811.1 F22: 431,t421,t811,t + 932,1822,t 
q31,t811,t 93109211811 + 932.1822, G3, 811.1 + Go 1822.1 + 833,t 


Equating elements of the prior matrix equation, we obtain 


O11 = gilt; O21,t = 921,t811,t5 
2 
022,t = 43, ,8l,t + 822,15 031,t = q31,t811,t> 


2 ï 
032,t = q31,t421,t811,t + q32,t822,t» 033,t = q31,t811,t + q32,1822,t + 833,t 


or, equivalently, 


O21,t 2 
§1l,t = O11,t, qt = ; §22,t = 922.1 — 921 18 11,1> 
O11, 
031,t 1 031,t 
qą31,t = > 032, = 032,.t = O11. |> 
O11,t 822,t O 21, 


5 2 
§33,t = 033,t — 931,28 11,1 — 932,1822,1- 


These quantities look complicated, but they are simply the coefficients and residual 
variances of the orthogonal transformation 


bit = air, 
bu = ax — Babi, 


b3, = a3, — B3ibir — B32b2,, 
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where f;; are the coefficients of least-squares regressions 


ax = Pubi + bx, 
azı = P31b1r + B32ba + bzr. 


In other words, we have qjj, = Pij, &iit = Var (bir) and bi Lbj: for i £ j. 

Based on the prior discussion, using Cholesky decomposition amounts to doing 
an orthogonal transformation from a; to b;, where bj, = aıt, and bj,, for 1 < i < k, 
is defined recursively by the least-squares regression 


Git = qi1,tbit + qi2,tbu +-+- + qia—-1),tba-1)zt + bit, (10.16) 


where q;;,; is the (i, j)th element of the lower triangular matrix L, for 1 < j <i. 
We can write this transformation as 


b, = L'a, o a, = L,b,, (10.17) 


where, as mentioned before, | is also a lower triangular matrix with unit diagonal 
elements. The covariance matrix of b, is the diagonal matrix G, of the Cholesky 
decomposition because 


Cov(b,) = Ly'E,(L7'Y = G,. 


The parameter vector relevant to volatility modeling under such a transformation 
becomes 


E; = (Stig -e hey Mia eh O51 as Poi -< -> Gas <- -> Gees); (10.18) 


which is also a k(k + 1)/2-dimensional vector. 
The previous orthogonal transformation also dramatically simplifies the likeli- 
hood function of the data. Using the fact that |Z,| = 1, we have 


k 
[E+] = 1LGL;| = |G; =| | git- (10.19) 


i=l 


If the conditional distribution of a, given the past information is multivariate nor- 
mal N(0, z,), then the conditional distribution of the transformed series b, is 
multivariate normal N (0, G,), and the log-likelihood function of the data becomes 
extremely simple. Indeed, we have the log probability density of a; as 


k 2 
1 b? 
€(a;, Z) = €(b;, Ey) = -7 ) [meno E — l > (10.20) 
= li,t 


where for simplicity the constant term is omitted and gj;,; is the variance of bjr. 
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Using the Cholesky decomposition to reparameterize X, has several advantages. 
First, from Eq. (10.19), £X, is positive definite if g;;,, > 0 for all i. Consequently, 
the positive-definite constraint of X, can easily be achieved by modeling In(g;j,;) 
instead of g;;,. Second, elements of the parameter vector 2, in Eq. (10.18) have 
nice interpretations. They are the coefficients and residual variances of multiple 
linear regressions that orthogonalize the shocks to the returns. Third, the correlation 


coefficient between aj; and ay; is 


O21,t a/I11,t 


P2, = — = 721,t X i 


af O11,t922,t 022,t 


which is time varying if g2;,, #0. In particular, if go}, = c #0, then po), = 
c./O11,¢/,/022,r, which continues to be time-varying provided that the variance ratio 
011,1/022,1 is not a constant. This time-varying property applies to other correlation 
coefficients when the dimension of r; is greater than 2 and is a major difference 
between the two approaches for reparameterizing &,. 

Using Eq. (10.16) and the orthogonality among the transformed shocks bir, we 
obtain 


i 
Giit = Var(ai| Fi-1) = XO diigo: i=1,...,k, 


v=! 


J 
Gij, = Cov(air, ajt|Fi-1) = ` iv,t4jv,t8vv,t» j<i, i=2,...,k, 


v=l 


where q,,,, = 1 for v=1,...,k. These equations show the parameterization of 
x; under the Cholesky decomposition. 


10.4 GARCH MODELS FOR BIVARIATE RETURNS 


Since the same techniques can be used to generalize many univariate volatility mod- 
els to the multivariate case, we focus our discussion on the multivariate GARCH 
model. Other multivariate volatility models can also be used. 

For a k-dimensional return series r;, a multivariate GARCH model uses “exact 
equations” to describe the evolution of the k(k + 1)/2-dimensional vector =; over 
time. By exact equation, we mean that the equation does not contain any stochastic 
shock. However, the exact equation may become complicated even in the simplest 
case of k = 2 for which £, is three dimensional. To keep the model simple, some 
restrictions are often imposed on the equations. 


10.4.1 Constant-Correlation Models 


To keep the number of volatility equations low, Bollerslev (1990) considers the 
special case in which the correlation coefficient p21, = p21 is time invariant, where 
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|e21| < 1. Under such an assumption, p21 is a constant parameter and the volatility 
model consists of two equations for E*, 


A GARCH(1,1) model for E becomes 


Sk 


. . Z 
which is defined as E* = (01,1, 022,1)'. 


E* = œo +aja7_, +B, E, (10.21) 


where a; = (a? as a? mi , æo is a two-dimensional positive vector, and œ; and 
B, are 2 x 2 nonnegative definite matrices. More specifically, the model can be 
expressed in detail as 


Or |_| œo œi 12 G4 bu Ên 011,11 
= + 2 T , 
022, 029 an 22 a5 4 Bo Ên 022,11 
(10.22) 


where ajo > 0 for i = 1 and 2. Defining n, = a? — &*, we can rewrite the prior 
model as 


ay = æo + (a + Daz +, — Bim, 


which is a bivariate ARMA(1,1) model for the a? process. This result is a direct 
generalization of the univariate GARCH(1,1) model of Chapter 3. Consequently, 
some properties of model (10.22) are readily available from those of the bivariate 
ARMA(1,1) model of Chapter 8. In particular, we have the following results: 


1. If all of the eigenvalues of a; + B, are positive, but less than 1, then the 
bivariate ARMA(1,1) model for a? is weakly stationary and, hence, E (a?) 
exists. This implies that the shock process a; of the returns has a positive- 
definite unconditional covariance matrix. The unconditional variances of the 
elements of a, are (7 o2) = I -a, -— B,)~'do, and the unconditional 
covariance between aj, and dz, iS (210102. 

2. If aj2 = Bi2 = 0, then the volatility of aj, does not depend on the past 
volatility of ax. Similarly, if a2; = 62; = 0, then the volatility of az; does 
not depend on the past volatility of ayz. 

3. If both a; and f, are diagonal, then the model reduces to two univari- 
ate GARCH(1,1) models. In this case, the two volatility processes are not 
dynamically related. 

4. Volatility forecasts of the model can be obtained by using forecasting methods 
similar to those of a vector ARMA(1,1) model; see the univariate case in 
Chapter 3. The 1-step-ahead volatility forecast at the forecast origin h is 


= 2 = 
T7 (1) = go + æa; + 1E}. 
For the £-step-ahead forecast, we have 


=; (£) = æo + (œ + B,)ER(€— 1), £> 1. 
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These forecasts are for the marginal volatilities of a;r. The €-step-ahead fore- 
cast of the covariance between ay; and ay; is 621 [011,n(€)022,n (£)]°°, where 
p21 is the estimate of p21 and oj;,,(€) are the elements of €% (£). 


Example 10.4. Again, consider the daily log returns of Hong Kong and 
Japanese markets of Example 10.1. Using a bivariate GARCH model, we obtain a 
constant correlation model that fits the data reasonably well. The mean equations 
of the bivariate model are 


Fit = 0.101 + ayy, 
Fa = 0.002 + a2, 


where the standard errors of the two estimates are 0.050 and 0.048, respectively. 
The volatility equations are 


0.079 0.145 
or | | (0.019) i (0.022) a i 
on, | | 0.054 0.105 le ici 
(0.019) (0.014) 
0.833 
(0.023) O11 t-1 
+ 0.875 | O2 t-1 |: (10.23) 
(0.020) 


where the numbers in parentheses are standard errors. The estimated constant cor- 
relation between the two returns is 0.668. 

Let a; = (år, 4&2) be the standardized residuals, where &i = ait /4/Oiit. The 
Ljung—Box statistics of a; give Q2(4) = 17.29(0.37) and Q2(12) = 48.21 (0.46), 
where the number in parentheses denotes the p value. Here the p values are 
based on chi-squared distributions with 16 and 48 degrees of freedom, respec- 
tively. The Q statistics of individual series &; shown in S-Plus output also fail 
to indicate any model inadequancy. Consequently, the constant correlation model 
in Eq. (10.23) fits the data reasonably well. Figure 10.7 shows the fitted volatil- 
ity processes of model (10.23), which can be compared with those of Example 
10.1. 

The model in Eq. (10.23) shows two uncoupled volatility equations, indicat- 
ing that the volatilities of the two markets are not dynamically related, but they 
are contemporaneously correlated. We refer to the model as a bivariate diagonal 
constant-correlation model. In practice, this type of models might not be suitable 
because there exists the possibility of dynamic dependence in volatility among 
markets, that is, the spillover effect in volatility. Finally, the constant-correlation 
model can easily be estimated using S-Plus: 
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Figure 10.7 Estimated volatilities for daily log returns in percentages of stock market indexes for 


Hong Kong and Japan from January 4, 2006, to December 30, 2008: (a) Hong Kong market and (b) 
Japanese market. Model used is Eq. (10.23). 


> mece = mgarch(rtn~1,~ccc(1,1),trace=F) 
> summary (mccc) 


Example 10.5. As a second illustration, consider the monthly log returns, in 
percentages, of IBM stock and the S&P 500 index from January 1926 to December 
1999 used in Chapter 8. Let rı; and rz, be the monthly log returns for IBM stock 
and the S&P 500 index, respectively. If a constant-correlation GARCH(1,1) model 
is entertained, we obtain the mean equations 


ry, = 1.351 + 0.072r1 1—1 + 0.055r1 4-2 — 0.119r2 4—2 + ait, 
ru = 0.703 + ax, 


where standard errors of the parameters in the first equation are 0.225, 0.029, 0.034, 
and 0.044, respectively, and the standard error of the parameter in the second 
equation is 0.155. The volatility equations are 


2.98 0.079 
our] _ | 0-59 |, | 0.013) af 
922.1 2.09 0.042 0.045 | [a5 ,_ 
(0.47) (0.009) (0.010) 
0.873 —0.031 
(0.020) (0.009) | [ori ,-1 
0.066 0.913 Be ' are 


(0.015) (0.014) 
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where the numbers in parentheses are standard errors. The constant correlation 
coefficient is 0.614 with standard error 0.020. Using the standardized residuals, we 
obtain the Ljung—Box statistics Q2(4) = 16.77(0.21) and Q2(8) = 32.40(0.30), 
where the p values shown in parentheses are obtained from chi-squared distri- 
butions with 13 and 29 degrees of freedom, respectively. Here the degrees of 
freedom have been adjusted because the mean equations contain three lagged pre- 
dictors. For the squared standardized residuals, we have Q3(4) = 18.00(0.16) and 
Q3(8) = 39.09(0.10). Therefore, at the 5% significance level, the standardized 
residuals a, have no serial correlations or conditional heteroscedasticities. This 
bivariate GARCH(1,1) model shows a feedback relationship between the volatilities 
of the two monthly log returns. 


10.4.2 Time-Varying Correlation Models 


A major drawback of the constant-correlation volatility models is that the cor- 
relation coefficient tends to change over time in a real application. Consider the 
monthly log returns of IBM stock and the S&P 500 index used in Example 10.5. 
It is hard to justify that the S&P 500 index return, which is a weighted aver- 
age, can maintain a constant-correlation coefficient with IBM return over the past 
70 years. Figure 10.8 shows the sample correlation coefficient between the two 
monthly log return series using a moving window of 120 observations (i.e., 10 
years). The correlation changes over time and appears to be decreasing in recent 
years. The decreasing trend in correlation is not surprising because the ranking of 
IBM market capitalization among large U.S. industrial companies has changed in 
recent years. A Lagrange multiplier statistic was proposed recently by Tse (2000) 
to test constant-correlation coefficients in a multivariate GARCH model. 

A simple way to relax the constant-correlation constraint within the GARCH 
framework is to specify an exact equation for the conditional correlation coeffi- 
cient. This can be done by two methods using the two reparameterizations of ©}, 
discussed in Section 10.3. First, we use the correlation coefficient directly. Because 
the correlation coefficient between the returns of IBM stock and S&P 500 index is 
positive and must be in the interval [0, 1], we employ the equation 


exp(qr) 


—— 10.25 
1 + exp(q:) i 


P21, = 


where 


4&1 ,t—142,t—1 


af F11,t-1922,t-1 


where o;;,;—1 is the conditional variance of the shock a;,;-;. We refer to this equation 
as a GARCH(1,1) model for the correlation coefficient because it uses the lag-1 
cross correlation and the lag-1 cross product of the two shocks. If a, = w2 = 0, 
then model (10.25) reduces to the case of constant correlation. 


Gt = w0 + O1 P21,1-1 + @2 


526 MULTIVARIATE VOLATILITY MODELS AND THEIR APPLICATIONS 


Pt 
0.6 


T T T ji T T T 
1940 1950 1960 1970 1980 1990 2000 
Year 
Figure 10.8 Sample correlation coefficient between monthly log returns of IBM stock and S&P 500 


index. Correlation is computed by a moving window of 120 observations. Sample period is from January 
1926 to December 1999. 


In summary, a time-varying correlation bivariate GARCH(1,1) model consists 
of two sets of equations. The first set of equations consists of a bivariate 
GARCH(1,1) model for the conditional variances, and the second set of equation 
is a GARCH(1,1) model for the correlation in Eq. (10.25). In practice, a negative 
sign can be added to Eq. (10.25) if the correlation coefficient is negative. 
In general, when the sign of correlation is unknown, we can use the Fisher 
transformation for correlation 


"E m(4 + a of we exp(q) — 1 
p= o aO fin E a a 
1 = pj: exp(q;) + 1 


and employ a GARCH model for q, to model the time-varying correlation between 
two returns. 


Example 10.5 (Continued). Augmenting Eq. (10.25) to the GARCH(1,1) 
model in Eq. (10.24) for the monthly log returns of IBM stock and the S&P 
500 index and performing a joint estimation, we obtain the following model for 
the two series: 


ry, = 1.318 + 0.07671 1-1 — 0.068r2 1—2 + air, 
Fa = 0.673 + t, 


GARCH MODELS FOR BIVARIATE RETURNS 527 


where standard errors of the three parameters in the first equation are 0.215, 0.026, 
and 0.034, respectively, and standard error of the parameter in the second equation 
is 0.151. The volatility equations are 


2.80 0.084 
ons |_| ©58) |,| (0.013) a 
on, | | L771 0.037 0.054 ai 
(0.40) (0.009) (0.010) 
0.864 —0.020 
(0.021) (0.009) Olli 
+| 0.058 0.914 Ea a 


(0.014) (0.013) 


where, as before, standard errors are in parentheses. The conditional correlation 
equation is 


op = Og = 7.024 43.9830, 1 + 0.088 et 
1 + exp(q) O11 t-1022,1—1 


(10.27) 


where standard errors of the estimates are 0.050, 0.090, and 0.019, respectively. 
The parameters of the prior correlation equation are highly significant. Apply- 
ing the Ljung—Box statistics to the standardized residuals a;, we have Q2(4) = 
20.57(0.11) and Q2(8) = 36.08(0.21). For the squared standardized residuals, we 
have Q35(4) = 16.69(0.27) and Q35(8) = 36.71(0.19). Therefore, the standardized 
residuals of the model have no significant serial correlations or conditional het- 
eroscedasticities. 

It is interesting to compare this time-varying correlation GARCH(1,1) model 
with the constant-correlation GARCH(1,1) model in Eq. (10.24). First, the mean 
and volatility equations of the two models are close. Second, Figure 10.9 shows 
the fitted conditional correlation coefficient between the monthly log returns of 
IBM stock and the S&P 500 index based on model (10.27). The plot shows that 
the correlation coefficient fluctuated over time and became smaller in recent years. 
This latter characteristic is in agreement with that of Figure 10.8. Third, the aver- 
age of the fitted correlation coefficients is 0.612, which is essentially the estimate 
0.614 of the constant-correlation model in Eq. (10.24). Fourth, using the sample 
variances of r;; as the starting values for the conditional variances and the obser- 
vations from t = 4 to t = 888, the maximized log-likelihood function is —3691.21 
for the constant-correlation GARCH(1,1) model and —3679.64 for the time-varying 
correlation GARCH(1,1) model. Thus, the time-varying correlation model shows 
some significant improvement over the constant-correlation model. Finally, con- 
sider the |-step-ahead volatility forecasts of the two models at the forecast origin 
h = 888. For the constant-correlation model in Eq. (10.24), we have aj,ggg = 3.075, 
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Figure 10.9 Fitted conditional correlation coefficient between monthly log returns of IBM stock and 
S&P 500 index using time-varying correlation GARCH{(1,1) model of Example 10.5. Horizontal line 
denotes average of 0.612 of correlation coefficients. 


2,988 = 4.931, 011,888 = 77.91, and 022.383 = 21.19. Therefore, the 1-step-ahead 
forecast for the conditional covariance matrix is 


> 71.09 21.83 
Zase (1) -| 21.83 17.79 iF 


where the covariance is obtained by using the constant-correlation coefficient 0.614. 
For the time-varying correlation model in Eqs. (10.26) and (10.27), we have 
Q1.888 = 3.287, ad2,888 = 4.950, 011,888 = 83.35, 022,888 = 28.56, and P888 = 0.546. 
The 1-step-ahead forecast for the covariance matrix is 


> 75.15 23.48 
%sss(1) = | 23.48 24.70 |; 


where the forecast of the correlation coefficient is 0.545. 

In the second method, we use the Cholesky decomposition of £, to model 
time-varying correlations. For the bivariate case, the parameter vector is 2; = 
(211,15 822.1, q21,t)'; see Eq. (10.18). A simple GARCH(1,1) type model for a; is 

Si = M10 + ærbi p + Ê11811,t-1; 
q21,t = Yo + V1921,1-1 + V242,t-1, (10.28) 


2 2 
822, = 290 + My, + &22b5 11 + b21811,1-1 + B22822,1-1, 
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where bj; =ar and by = ax — q21,t41t. Thus, bj; assumes a univariate 
GARCH(1,1) model, bz; uses a bivariate GARCH(1,1) model, and qa; is 
autocorrelated and uses ad2;-; as an additional explanatory variable. The 
probability density function relevant to maximum-likelihood estimation is given 
in Eq. (10.20) with k = 2. 


Example 10.5 (Continued). Again we use the monthly log returns of IBM 
stock and the S&P 500 index to demonstrate the volatility model in Eq. (10.28). 
Using the same specification as before, we obtain the fitted mean equations as 


ry, = 1.364 + 0.07571 1-1 = 0.05872, :—2 + dit, 
24 = 0.643 + a2, 


where standard errors of the parameters in the first equation are 0.219, 0.027, and 
0.032, respectively, and the standard error of the parameter in the second equation 
is 0.154. These two mean equations are close to what we obtained before. The 
fitted volatility model is 


Sit = 3.714 + 0.11357 ,_, + 0.804811 2-1, 
q21,t = 0.0029 + 0.9915q21 1—1 _ 0.004 1a. 4—1 ; (10.29) 
22,1 = 1.023 + 0.021b7 ,_, + 0.052b5 ,_; — 0.040g11,--1 + 0.937999, 1-1, 


where by; = air, and ba, = az — q21,tb11. Standard errors of the parameters in the 
equation of gi;,, are 1.033, 0.022, and 0.037, respectively; those of the parameters 
in the equation of q21,+ are 0.001, 0.002, and 0.0004; and those of the parameters in 
the equation of g22,, are 0.344, 0.007, 0.013, and 0.015, respectively. All estimates 
are statistically significant at the 1% level. 

The conditional covariance matrix £, can be obtained from model (10.29) 
by using the Cholesky decomposition in Eq. (10.12). For the bivariate case, the 
relationship is given specifically in Eq. (10.13). Consequently, we obtain the time- 
varying correlation coefficient as 


b= O21,t = 921,t./811,t (10.30) 


O10 / 2 
Meare 822.1 + 95) Slit 


Using the fitted values of 01), and o22,,, we can compute the standardized residuals 
to perform model checking. The Ljung—Box statistics for the standardized resid- 
uals of model (10.29) give Q2(4) = 19.77(0.14) and Q2(8) = 34.22(0.27). For 
the squared standardized residuals, we have Q3(4) = 15.34(0.36) and Q3(8) = 
31.87(0.37). Thus, the fitted model is adequate in describing the conditional mean 
and volatility. The model shows a strong dynamic dependence in the correlation; 
see the coefficient 0.9915 in Eq. (10.29). 
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Figure 10.10 Fitted conditional correlation coefficient between monthly log returns of IBM stock and 
S&P 500 index using time-varying correlation GARCH(1,1) model of Example 10.5 with Cholesky 
decomposition. Horizontal line denotes average of 0.612 of the estimated coefficients. 


Figure 10.10 shows the fitted time-varying correlation coefficient in Eq. (10.30). 
It shows a smoother correlation pattern than that of Figure 10.9 and confirms 
the decreasing trend of the correlation coefficient. In particular, the fitted correla- 
tion coefficients in recent years are smaller than those of the other models. The 
two time-varying correlation models for the monthly log returns of IBM stock 
and the S&P 500 index have comparable maximized-likelihood functions of about 
—3672, indicating the fits are similar. However, the approach based on the Cholesky 
decomposition may have some advantages. First, it does not require any parameter 
constraint in estimation to ensure the positive definiteness of &,. If one also uses 
log transformation for g;; t, then no constraints are needed for the entire volatility 
model. Second, the log-likelihood function becomes simple under the transforma- 
tion. Third, the time-varying parameters qij, and g;;,; have nice interpretations. 
However, the transformation makes inference a bit more complicated because the 
fitted model may depend on the ordering of elements in az; recall that aj, is not 
transformed. In theory, the ordering of elements in a; should have no impact on 
volatility. 

Finally, the 1-step-ahead forecast of the conditional covariance matrix at the 
forecast origin f = 888 for the new time-varying correlation model is 


7.34 17.87 


a 73.45 7.34 
Èsss(1) = | | : 


GARCH MODELS FOR BIVARIATE RETURNS 531 


The correlation coefficient of the prior forecast is 0.203, which is substantially 
smaller than those of the previous two models. However, forecasts of the condi- 
tional variances are similar as before. 


10.4.3 Dynamic Correlation Models 


Using the parameterization in Eq. (10.7), several authors have proposed parsimo- 
nious models for p, to describe the time-varying correlations. We refer to those 
models as the dynamic conditional correlation (DCC) models. 

For k-dimensional returns, Tse and Tsui (2002) assume that the conditional 
correlation matrix p, follows the model 


Pi = (1—01 — 62)p + O1p,_, OPi 


where 6, and 6 are scalar parameters, p is ak x k positive-definite matrix with unit 
diagonal elements, and y,_, is the k x k sample correlation matrix using shocks 
from t — m, ...,t — 1 for a prespecified m. Typically, one assumes that 0 < 6; < 1 
and 6; + 62 < 1 so that the resulting correlation matrix p, is positive definite for 
all t. For a given p, the model is parsimonious. In applications, the choice of p 
and m deserves a careful investigation. One possibility is to let pọ be the sample 
correlation matrix of the returns. The correlation equation then only employs two 
parameters. 
Engle (2002) proposes the model 


Pi = JQ, Jr, 
; Hi ; : =1/2 —1/2 
where Q, = (qij,t)kxk 18 a positive-definite matrix, J; = diag{q), ; er h 
and Q, satisfies 


Q, = (1 = 01 — 6) Q + O1€;-1€,_| + 02 Qiii 


where €; is the standardized innovation vector with elements €;; = ait /4/Oii,t» Q 
is the unconditional covariance matrix of €;, and 0; and 0z are nonnegative scalar 
parameters satisfying 0 < 6; +62 < 1. The J; matrix is a normalization matrix to 
guarantee that R, is a correlation matrix. 

An obvious drawback of the prior two models is that 6; and 62 are scalar so 
that all the conditional correlations have the same dynamics. This might be hard 
to justify in real applications, especially when the dimension k is large. 

Tsay (2006) extends the previous DCC models in two ways. First, the standard- 
ized innovations are assumed to follow a multivariate Student-r distribution of Eq. 
(10.42). Second, the marginal volatility models have leverage effects. Specifically, 
the volatility equation for r; is 


D? = Ag+ A, D?_, + A2A?_, + A3L7.,, (10.31) 
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where D, is the diagonal matrix of volatilities as defined in Eq. (10.7), Aj = 
diag{a);,..., akj} Ai = diag{l1;,..., ki} are k x k diagonal matrices of param- 
eters and L,_; = diag{L,;-1,..., Lk,ıt—1} is also a k x k diagonal matrix with 
diagonal elements 


Giz-1 if aj7-1 < 9, 
Lit-1 = f 
(0) otherwise. 


In Eq. (10.31), the parameters £;; satisfy 0< yy lj <1, €:0>0 for 
i=1,...,k, and £j; = 0 for all positive i and j. The constraint ensures that the 
volatilities exist. Of course, if A3 = 0, then there is no leverage effect. 

The correlation equation is 


pi = (1 — 01 — 62)0 + 01 Wii + 920;_1, (10.32) 


where p is the sample correlation matrix of the returns and 0 < 6; + 62 < 1 with 
6; > 0 fori = 1, 2. 


Example 10.6. To illustrate the DCC model, we consider the daily exchange 
rates between U.S. dollar versus European euro and Japanese yen and the stock 
prices of IBM and Dell from January 1999 to December 2004. The exchange rates 
are the noon spot rate obtained from the Federal Reserve Bank of St. Louis and 
the stock returns are from the Center for Research in Security Prices (CRSP). We 
compute the simple returns of the exchange rates and remove returns for those 
days when one of the markets was not open. This results in a four-dimensional 
return series with 1496 observations. The return vector is r; = (Fit, ror, 7375 Tar)’ 
with rı; and ry being the returns of euro and yen exchange rate, respectively, and 
r3, and r4, are the returns of IBM and Dell stock, respectively. All returns are 
in percentages. Figure 10.11 shows the time plot of the return series. From the 
plot, equity returns have higher variability than the exchange rate returns, and the 
variability of equity returns appears to be decreasing in latter years. Table 10.1 
provides some descriptive statistics of the return series. As expected, the means of 
the returns are essentially zero and all four series have heavy tails with positive 
excess kurtosis. 

The equity returns have some serial correlations, but the magnitude is small. If 
multivariate Ljung—Box statistics are used, we have Q(3) = 59.12 with a p value 
of 0.13 and Q(5) = 106.44 with a p value of 0.03. For simplicity, we use the 
sample mean as the mean equation and apply the proposed multivariate volatility 
model to the mean-corrected data. In estimation, we start with a general model, but 
add some equality constraints as some estimates appear to be close to each other. 
The results are given in Table 10.2 along with the value of likelihood function 
evaluated at the estimates. 

For each estimated multivariate volatility model in Table 10.2, we compute the 
standardized residuals as 


ê = x, a, 
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Figure 10.11 Time plots of daily simple return series from January 1999 to December 2004: 
(a) dollar—euro exchange rate, (b) dollar—yen exchange rate, (c) IBM stock, and (d) Dell stock. 


TABLE 10.1 Descriptive Statistics of Daily Returns of Example 10.6.“ 
Asset USEU JPUS IBM DELL 


Mean 0.0091 —0.0059 0.0066 0.0028 
Standard error 0.6469 0.6626 5.4280 10.1954 
Skewness 0.0342 —0.1674 —0.0530 —0.0383 
Excess kurtosis 2.7090 2.0332 6.2164 3.3054 
Box—Ljung Q(12) 12.5 6.4 24.1 24.1 


“The returns are in percentages, and the sample period is from January 1999 to December 2004 for 
1496 observations. 


where £, ? is the symmetric square root matrix of the estimated volatility matrix 
Z,. We apply the multivariate Ljung—Box statistics to the standardized residuals €, 
and its squared process è of a fitted model to check model adequacy. For the full 
model in Table 10.2(a), we have Q (10) = 167.79(0.32) and Q (10) = 110.19(1.00) 
for ê, and è, respectively, where the number in parentheses denotes p value. 
Clearly, the model adequately describes the first two moments of the return series. 
For the model in Table 10.2(b), we have Q(10) = 168.59(0.31) and Q(10) = 
109.93(1.00). For the final restricted model in Table 10.2(c), we obtain Q(10) = 
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TABLE 10.2 Estimation Results of Multivariate Volatility Models for Example 10.6“ 


(a) Full Model Estimation with Lmax = —9175.80 


Ao Ay A2 (v, 01, 02)’ 
0.0041 (0.0033) 0.9701(0.0114) 0.0214(0.0075) 7.8729(0.4693) 
0.0088(0.0038) 0.9515(0.0126) 0.0281(0.0084) 0.9808(0.0029) 
0.0071(0.0053) 0.9636(0.0092) 0.0326(0.0087) 0.0137(0.0025) 
0.0150(0.0136) 0.9531(0.0155) 0.0461(0.0164) 
(b) Restricted Model with Lmax = —9176.62 
Ao Aj =AaxI A2 (v, 81, 02) 
0.0066(0.0028) 0.9606(0.0068) 0.0255(0.0068) 7.8772(0.7144) 
0.0066(0.0023) 0.0240(0.0059) 0.9809(0.0042) 
0.0080(0.0052) 0.0355(0.0068) 0.0137(0.0025) 
0.0108(0.0086) 0.0385(0.0073) 
(c) Final Restricted Model with Lmax = —9177.44 
Ao(à1, Aq, A3, A4) A, =AxI A2(b1, bı, b2, b2) (v, 01, 62)’ 
0.0067(0.0021) 0.9603(0.0063) 0.0248(0.0048) 7.9180(0.6952) 
0.0067(0.0021) 0.0248(0.0048) 0.9809(0.0042) 
0.0061 (0.0044) 0.0372(0.0061) 0.0137(0.0028) 
0.0148(0.0084) 0.0372(0.0061) 
(d) Model with Leverage Effects, Lmax = —9169.04 

Ao(à1, A2, 43, A4) A; =AxI A2(b1, b2, b3, ba) (v, 01, 62)’ 
0.0064(0.0027) 0.9600(0.0065) 0.0254(0.0063) 8.4527(0.7556) 
0.0066(0.0023) 0.0236(0.0054) 0.9810(0.0044) 
0.0128(0.0055) 0.0241(0.0056) 0.0132(0.0027) 
0.0210(0.0099) 0.0286(0.0062) 


“Tmax denotes the value of likelihood function evaluated at the estimates, v is the degrees of freedom of 
the multivariate Student-r distribution, and the numbers in parentheses are asymptotic standard errors. 


168.50(0.31) and Q(10) = 111.75(1.00). Again, the restricted models are capable 
of describing the mean and volatility of the return series. 

From Table 10.2, we make the following observations. First, using the likelihood 
ratio test, we cannot reject the final restricted model compared with the full model. 
This results in a very parsimonious model consisting of only 9 parameters for the 
time-varying correlations of the four-dimensional return series. Second, for the two 
stock return series, the constant terms in Ao are not significantly different from zero, 
and the sum of GARCH parameters is 0.0372 + 0.9603 = 0.9975, which is very 
close to unity. Consequently, the volatility series of the two equity returns exhibit 
IGARCH behavior. On the other hand, the volatility series of the two exchange rate 
returns appear to have a nonzero constant term and high persistence in GARCH 
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Figure 10.12 Time plots of estimated volatility series of four asset returns. Solid line is from proposed 


model and dashed line is from a rolling estimation with window size 69: (a) dollar—euro exchange rate, 
(b) dollar—yen exchange rate, (c) IBM stock, and (d) Dell stock. 


parameters. Third, to better understand the efficacy of the proposed model, we 
compare the results of the final restricted model with those of rolling estimates. 
The rolling estimates of covariance matrix are obtained using a moving window of 
size 69, which is the approximate number of trading days in a quarter. Figure 10.12 
shows the time plot of estimated volatility. The solid line is the volatility obtained 
by the proposed model and the dashed line is for volatility of the rolling estimation. 
The overall pattern seems similar, but, as expected, the rolling estimates respond 
more slowly than the proposed model to large innovations. This is shown by the 
faster rise and decay of the volatility obtained by the proposed model. Figure 10.13 
shows the time-varying correlations of the four asset returns. The solid line denotes 
correlations obtained by the final restricted model of Table 10.2, whereas the dashed 
line is for rolling estimation. The correlations of the proposed model seem to be 
smoother. 

Table 10.2(d) gives the results of a fitted integrated GARCH-type model with 
leverage effects. The leverage effects are statistically significant for equity returns 
only and are in the form of an IGARCH model. Specifically, the A3 matrix of the 
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Figure 10.13 Time plots of time-varying correlations between percentage simple returns of four assets 
from January 1999 to December 2004. Solid line is from the proposed model, whereas dashed line is 
from a rolling estimation with window size 69. 
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correlation equation in Eq. (10.31) is 


A3 = diag {0, 0, (1 — 0.96 — 0.0241), (1 — 0.96 — 0.0286)} 
= diag{0, 0, 0.0159, 0.0114}. 


Although the magnitudes of the leverage parameters are small, they are statistically 
significant. This is shown by the likelihood ratio test. Specifically, comparing the 
fitted models in Table 10.2(b) and (d), the likelihood ratio statistic is 15.16, which 
has a p value of 0.0005 based on the chi-squared distribution with 2 degrees of 
freedom. 


10.5 HIGHER DIMENSIONAL VOLATILITY MODELS 


In this section, we make use of the sequential nature of Cholesky decomposition to 
suggest a strategy for building a high-dimensional volatility model. Again write the 
vector return series as r; = ft, +a. The mean equations for r, can be specified 
by using the methods of Chapter 8. A simple vector AR model is often sufficient. 
Here we focus on building a volatility model using the shock process az. 

Based on the discussion of Cholesky decomposition in Section 10.3, the orthog- 
onal transformation from aj; to bj; only involves bjs for j <i. In addition, the 
time-varying volatility models built in Section 10.4 appear to be nested in the sense 
that the model for g;;,, depends only on quantities related to bj; for j < i. Con- 
sequently, we consider the following sequential procedure to build a multivariate 
volatility model: 


1. Select a market index or a stock return that is of major interest. Build a 
univariate volatility model for the selected return series. 

2. Augment a second return series to the system, perform the orthogonal trans- 
formation on the shock process of this new return series, and build a bivariate 
volatility model for the system. The parameter estimates of the univariate 
model in step | can be used as the starting values in bivariate estimation. 

3. Augment a third return series to the system, perform the orthogonal trans- 
formation on this newly added shock process, and build a three-dimensional 
volatility model. Again parameter estimates of the bivariate model can be 
used as the starting values in the three-dimensional estimation. 

4. Continue the augmentation until a joint volatility model is built for all the 
return series of interest. 


Finally, model checking should be performed in each step to ensure the adequacy 
of the fitted model. Experience shows that this sequential procedure can sim- 
plify substantially the complexity involved in building a high-dimensional volatility 
model. In particular, it can markedly reduce the computing time in estimation. 
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Figure 10.14 Time plots of daily log returns in percentages of (a) S&P 500 index and stocks of (b) 
Cisco Systems and (c) Intel Corporation from January 2, 1991, to December 31, 1999. 


Example 10.7. We demonstrate the proposed sequential procedure by building 
a volatility model for the daily log returns of the S&P 500 index and the stocks 
of Cisco Systems and Intel Corporation. The data span is from January 2, 1991, 
to December 31, 1999, with 2275 observations. The log returns are in percentages 
and shown in Figure 10.14. Components of the return series are ordered as r, = 
(SP5,, CSCO,, INTC,)’. The sample means, standard errors, and correlation matrix 
of the data are 


0.066 rom 0.875 1.00 0.52 0.50 
m= | 0.257 |, 6. | =| 2.853 |, p=] 0.52 1.00 0.47 
0.156 63 2.464 0.50 0.47 1.00 


Using the Ljung—Box statistics to detect any serial dependence in the return 
series, we obtain Q3(1) = 26.20, 03(4) = 79.73, and Q3(8) = 123.68. These test 
statistics are highly significant with p values close to zero as compared with 
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TABLE 10.3 Sample Cross-Correlation Matrices of Daily Log Returns of S&P 500 
Index and Stocks of Cisco Systems and Intel Corporation from January 2, 1991, to 
December 31, 1999 


Lag 


chi-squared distributions with degrees of freedom 9, 36, and 72, respectively. 
There is indeed some serial dependence in the data. Table 10.3 gives the first 
five lags of sample cross-correlation matrices shown in the simplified notation of 
Chapter 8. An examination of the table shows that (a) the daily log returns of 
the S&P 500 index does not depend on the past returns of Cisco or Intel, (b) the 
log return of Cisco stock has some serial correlations and depends on the past 
returns of the S&P 500 index (see lags 2 and 5), and (c) the log return of Intel 
stock depends on the past returns of the S&P 500 index (see lags 1 and 5). These 
observations are similar to those between the returns of IBM stock and the S&P 
500 index analyzed in Chapter 8. They suggest that returns of individual large- 
cap companies tend to be affected by the past behavior of the market. However, 
the market return is not significantly affected by the past returns of individual 
companies. 

Turning to volatility modeling and following the suggested procedure, we start 
with the log returns of the S&P 500 index and obtain the model 


rıt = 0.078 + 0.042r, -1 — 0.062r; +3 = 0.0487) 1—4 = 0.052r1 +5 + dit, 
11,1 = 0.013 + 0.0924? ,_, + 0.894011 4-1, (10.33) 


where standard errors of the parameters in the mean equation are 0.016, 0.023, 
0.020, 0.022, and 0.020, respectively, and those of the parameters in the volatility 
equation are 0.002, 0.006, and 0.007, respectively. Univariate Ljung—Box statistics 
of the standardized residuals and their squared series fail to detect any remaining 
serial correlation or conditional heteroscedasticity in the data. Indeed, we have 
Q(10) = 7.38(0.69) for the standardized residuals and Q(10) = 3.14(0.98) for the 
squared series. 

Augmenting the daily log returns of Cisco stock to the system, we build a 
bivariate model with mean equations given by 


Fit = 0.065 = 0.046r.;—3 + dit, 
ra = 0.325 + 0.195r1 1—2 — 0.09172 1-2 + ax, (10.34) 
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where all of the estimates are statistically significant at the 1% level. Using the 
notation of Cholesky decomposition, we obtain the volatility equations as 


111 = 0.006 + 0.05157 ,_, + 0.943 8112-1, 
quit = 0.331 + 0.790q21 1-1 — 0.041a21-1, (10.35) 
22,1 = 0.177 + 0.08253 ,_, + 0.890822 1-1, 


where by; = ait, bo = a2 — q21,tb1it, standard errors of the parameters in the 
equation of g1;,, are 0.001, 0.005, and 0.006, those of the parameters in the 
equation of q21, are 0.156, 0.099, and 0.011, and those of the parameters in 
the equation of g22, are 0.029, 0.008, and 0.011, respectively. The bivariate 
Ljung—Box statistics of the standardized residuals fail to detect any remaining 
serial dependence or conditional heteroscedasticity. The bivariate model is 
adequate. Comparing with Eq. (10.33), we see that the difference between the 
marginal and univariate models of rı; is small. 

The next and final step is to augment the daily log returns of Intel stock to the 
system. The mean equations become 


rit = 0.065 — 0.0437) 1-3 + air, 
ro, = 0.326 + 0.20171 +2 — 0.0897, +1 + ax, (10.36) 
r3, = 0.192 — 0.2641, ;-1 + 0.059r3 1—1 + a3, 
where standard errors of the parameters in the first equation are 0.016 and 0.017, 
those of the parameters in the second equation are 0.052, 0.059, and 0.021, and 
those of the parameters in the third equation are 0.050, 0.057, and 0.022, respec- 
tively. All estimates are statistically significant at about the 1% level. As expected, 
the mean equations for rj; and rz; are essentially the same as those in the bivariate 
case. 
The three-dimensional time-varying volatility model becomes a bit more com- 
plicated, but it remains manageable as 
211.2 = 0.006 + 0.050; „1 + 0.943 9111-1, 
q21, = 0.277 + 0.824921 1-1 — 0.035a2,1-1, 
22,1 = 0.178 + 0.082b5 ,_; + 0.889999 1-1, 
931.2 = 0.039 + 0.973q31,+—1 + 0.010a3 1—1, (10.37) 
932, = 0.006 + 0.981932,+-1 + 0.00442, 1-1, 
233,. = 1.188 + 0.053b3 ,_ + 0.687233 1-1 — 0.019999 4—1, 
where by; = air, bar = ax — q21,tb1ir, bD3t = a3t — 931.1011 — 932,402, and standard 


errors of the parameters are given in Table 10.4. Except for the constant 
term of the g32,, equation, all estimates are significant at the 5% level. Let 
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TABLE 10.4 Standard Errors of Parameter Estimates of Three-Dimensional 
Volatility Model for Daily Log Returns in Percentages of S&P 500 Index and Stocks 
of Cisco Systems and Intel Corporation from January 2, 1991, to December 31, 1999° 


Equation Standard Error 


q2\,t 0.135 0.086 0.010 
q31,t 0.017 0.012 0.004 
0.004 0.013 0.001 


Equation Standard Error 
gilt 0.001 0.005 0.006 
822.1 0.029 0.009 0.011 
0.015 0.100 


32,1 


“The ordering of the parameter is the same as appears in Eq. (10.37). 


G@; = (aut /O1;, 42¢/G2;, 431/637)’ be the standardized residual series, where 
Oi = VGiit is the fitted conditional standard error of the ith return. The 
Ljung—Box statistics of a, give Q3(4) = 34.48(0.31) and Q3(8) = 60.42(0.70), 
where the degrees of freedom of the chi-squared distributions are 31 and 
67, respectively, after adjusting for the number of parameters used in the 
mean equations. For the squared standardized residual series č, we have 
Q3(4) = 28.71 (0.58) and @Q3(8) = 52.00(0.91). Therefore, the fitted model 
appears to be adequate in modeling the conditional means and volatilities. 

The three-dimensional volatility model in Eq. (10.37) shows some interest- 
ing features. First, it is essentially a time-varying correlation GARCH(1,1) model 
because only lag-1 variables are used in the equations. Second, the volatility of 
the daily log returns of the S&P 500 index does not depend on the past volatil- 
ities of Cisco or Intel stock returns. Third, by taking the inverse transformation 
of the Cholesky decomposition, the volatilities of daily log returns of Cisco and 
Intel stocks depend on the past volatility of the market return; see the relationships 
between elements of Z,, L;, and G; given in Section 10.3. Fourth, the correlation 
quantities q;;,, have high persistence with large AR(1) coefficients. 

Figure 10.15 shows the fitted volatility processes of the model (i.e., 6;;,) for 
the data. The volatility of the index return is much smaller than those of the two 
individual stock returns. The plots also show that the volatility of the index return 
has increased in recent years, but this is not the case for the return of Cisco Systems. 
Figure 10.16 shows the time-varying correlation coefficients between the three 
return series. Of particular interest is to compare Figures 10.15 and 10.16. They 
show that the correlation coefficient between two return series increases when the 
returns are volatile. This is in agreement with the empirical study of relationships 
between international stock market indexes for which the correlation between two 
markets tends to increase during a financial crisis. 

The volatility model in Eq. (10.37) consists of two sets of equations. The first 
set of equations describes the time evolution of conditional variances (i.e., giit), 
and the second set of equations deals with correlation coefficients (i.e., qij, with 
i > j). For this particular data set, an AR(1) model might be sufficient for the 
correlation equations. Similarly, a simple AR model might also be sufficient for 
the conditional variances. Define v; = (11,1, V22,r, 033.1)’, Where Viit = In(gii.r), 
and q, = (421.1, 931.1; 932.1)’. The previous discussion suggests that we can use the 
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Figure 10.15 Time plots of fitted volatilities for daily log returns, in percentages, of (a) S&P 500 
index and stocks of (b) Cisco Systems and (c) Intel Corporation from January 2, 1991, to December 
31, 1999. 


simple lag-1 models 


v = C1 + By vY;-1, q; = €2 + baqi- 


as exact functions to model the volatility of asset returns, where c; are constant 
vectors and ĝ; are 3 x 3 real-valued matrices. If a noise term is also included in 
the above equations, then the models become 


v, = c1 + By v;-1 + ett, qı = €2 + b241 + ex, 


where e;; are random shocks with mean zero and a positive-definite covariance 
matrix, and we have a simple multivariate stochastic volatility model. In a recent 
manuscript, Chib, Nardari, and Shephard (1999) use Markov chain Monte Carlo 
(MCMC) methods to study high-dimensional stochastic volatility models. The 
model considered there allows for time-varying correlations, but in a relatively 
restrictive manner. Additional references of multivariate volatility model include 
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Figure 10.16 Time plots of fitted time-varying correlation coefficients between daily log returns of 
S&P 500 index and stocks of Cisco Systems and Intel Corporation from January 2, 1991, to December 
31, 1999. 


Harvey, Ruiz, and Shephard (1994). We discuss MCMC methods in volatility mod- 
eling in Chapter 12. 


10.6 FACTOR-VOLATILITY MODELS 


Another approach to simplifying the dynamic structure of a multivariate volatility 
process is to use factor models. In practice, the “common factors” can be determined 
a priori by substantive matter or empirical methods. As an illustration, we use the 
factor analysis of Chapter 8 to discuss factor—volatility models. Because volatility 
models are concerned with the evolution over time of the conditional covariance 
matrix of a;, where a; = r; — 4,, a simple way to identify the “common factors” 
in volatility is to perform a principal component analysis (PCA) on a,;; see the 
PCA of Chapter 8. Building a factor—volatility model thus involves a three-step 
procedure: 


e Select the first few principal components that explain a high percentage of 
variability in az. 
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e Build a volatility model for the selected principal components. 


e Relate the volatility of each aj; series to the volatilities of the selected principal 
components. 


The objective of such a procedure is to reduce the dimension but maintain an 
accurate approximation of the multivariate volatility. 


Example 10.8. Consider again the monthly log returns, in percentages, of IBM 
stock and the S&P 500 index of Example 10.5. Using the bivariate AR(3) model 
of Example 8.4, we obtain an innovational series a;. Performing a PCA on a; 
based on its covariance matrix, we obtained eigenvalues 63.373 and 13.489. The 
first eigenvalue explains 82.2% of the generalized variance of a;. Therefore, we 
may choose the first principal component x, = 0.797a,; + 0.604a2, as the common 
factor. Alternatively, as shown by the model in Example 8.4, the serial dependence 
in r; is weak and, hence, one can perform the PCA on r; directly. For this particular 
instance, the two eigenvalues of the sample covariance matrix of r; are 63.625 and 
13.513, which are essentially the same as those based on a;. The first principal 
component explains approximately 82.5% of the generalized variance of r;, and 
the corresponding common factor is x, = 0.796r;; + 0.605r2,. Consequently, for 
the two monthly log return series considered, the effect of the conditional mean 
equations on PCA is negligible. 

Based on the prior discussion and for simplicity, we use x, = 0.796rq; + 
0.605r2; as a common factor for the two monthly return series. Figure 10.17(a) 
shows the time plot of this common factor. If univariate Gaussian GARCH models 
are entertained, we obtain the following model for xz: 


Xt = 1.317 + 0.096x;_1 + a, dt = 0Otét, 
of = 3.834 + 0.110a7_, + 0.82507 ,. (10.38) 


All parameter estimates of the previous model are highly significant at the 1% level, 
and the Ljung—Box statistics of the standardized residuals and their squared series 
fail to detect any model inadequacy. Figure 10.17(b) shows the fitted volatility of 
x; [i.e., the sample ore series in Eq. (10.38)]. 

Using o7 of model (10.38) as a common volatility factor, we obtain the following 
model for the original monthly log returns. The mean equations are 


Fit = 1.140 + 0.079r1 5-1 + 0.067r1 1—2 = 0.122r2 »_2 + air, 
ro = 0.537 + ax, 


where standard errors of the parameters in the first equation are 0.211, 0.030, 0.031, 
and 0.043, respectively, and standard error of the parameter in the second equation 
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Figure 10.17 (a) Time plot of first principal component of monthly log returns of IBM stock and S&P 
500 index. (b) Fitted volatility process based on a GARCH(1,1) model. 


is 0.165. The conditional variance equation is 


19.08 0.098 0.333 

ous |_| 670 (0.044) üi (0.076) | > 

| out |= aA Es +! 0506 |7 
(2.36) (0.050) 


(10.39) 


where, as before, standard errors are in parentheses, and o? is obtained from model 
(10.38). The conditional correlation equation is 


O exp(gi) 
t — eh anode Ne 

1 + exp(q;) 
41, t—142,t—1 


af F11,t-1922,t-1 


where standard errors of the three parameters are 0.025, 0.038, and 0.015, 
respectively. Defining the standardized residuals as before, we obtain 
Q2(4) = 15.37(0.29) and Q2(8) = 34.24(0.23), where the number in parentheses 
denotes the p value. Therefore, the standardized residuals have no serial 


qı = —2.098 + 4.120p,_1 + 0.078 (10.40) 
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correlations. Yet we have Q3(4) = 20.25(0.09) and Q3(8) = 61.95(0.0004) for 
the squared standardized residuals. The volatility model in Eq. (10.39) does not 
adequately handle the conditional heteroscedasticity of the data especially at 
higher lags. This is not surprising as the single common factor only explains about 
82.5% of the generalized variance of the data. 

Comparing the factor model in Eqs. (10.39) and (10.40) with the time-varying 
correlation model in Eqs. (10.26) and (10.27), we see that (a) the correlation 
equations of the two models are essentially the same, (b) as expected the factor 
model uses fewer parameters in the volatility equation, and (c) the common- 
factor model provides a reasonable approximation to the volatility process of 
the data. 


Remark. In Example 10.8, we used a two-step estimation procedure. In the 
first step, a volatility model is built for the common factor. The estimated volatility 
is treated as given in the second step to estimate the multivariate volatility model. 
Such an estimation procedure is simple but may not be efficient. A more efficient 
estimation procedure is to perform a joint estimation. This can be done relatively 
easily provided that the common factors are known. For example, for the monthly 
log returns of Example 10.8, a joint estimation of Eqs. (10.38)—(10.40) can be per- 
formed if the common factor x; = 0.76971; + 0.605r2; is treated as given. 


10.7 APPLICATION 


We illustrate the application of multivariate volatility models by considering the 
value at risk (VaR) of a financial position with multiple assets. Suppose that an 
investor holds a long position in the stocks of Cisco Systems and Intel Corpora- 
tion each worth $1 million. We use the daily log returns for the two stocks from 
January 2, 1991, to December 31, 1999, to build volatility models. The VaR is 
computed using the 1-step-ahead forecasts at the end of data span and 5% critical 
values. 

Let VaR, be the value at risk for holding the position on Cisco Systems stock 
and VaR2 for holding Intel stock. Results of Chapter 7 show that the overall daily 
VaR for the investor is 


VaR = ,/ VaR? + VaR + 2p VaR, VaR>. 


In this illustration, we consider three approaches to volatility modeling for cal- 
culating VaR. For simplicity, we do not report standard errors for the parameters 
involved or model checking statistics. Yet all of the estimates are statistically signifi- 
cant at the 5% level, and the models are adequate based on the Ljung—Box statistics 
of the standardized residual series and their squared series. The log returns are in 
percentages so that the quantiles are divided by 100 in VaR calculations. Let rj; 
be the return of Cisco stock and rz, the return of Intel stock. 
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Univariate Models 

This approach uses a univariate volatility model for each stock return and uses the 
sample correlation coefficient of the stock returns to estimate p. The univariate 
volatility models for the two stock returns are 


riz = 0.380 + 0.034r1 1—1 — 0.061r1 1-2 — 0.05571 4-3 + aur, 
of, = 0.599 + 0.11747, + 0.81407, 
and 
24 = 0.187 + a2, 


o2 = 0.310 + 0.03243 ,_ EE 0.91802,_1. 


The sample correlation coefficient is 0.473. The 1-step-ahead forecasts needed in 
VaR calculation at the forecast origin t = 2275 are 


fi =0.626, 67=4.152, %=0.187, a7 = 6.087, 6 = 0.473. 
The 5% quantiles for both daily returns are 
qı = 0.626 — 1.65v 4.152 = —2.736, q2 = 0.187 — 1.65v 6.087 = —3.884, 


where the negative sign denotes loss. For the individual stocks, VaR; = 
$1000000q1 /100 = $27,360andVaRz = $1000000q2/100 = $38,840. Conse- 
quently, the overall VaR for the investor is VaR = $57,117. 


Constant-Correlation Bivariate Model 

This approach employs a bivariate GARCH(1,1) model for the stock returns. The 
correlation coefficient is assumed to be constant over time, but it is estimated jointly 
with other parameters. The model is 


rip = 0.385 + 0.038r1,;—-1 — 0.06071 1—2 — 0.047r1 4-3 + air, 
ra = 0.222 + az, 

oii = 0.624 + 0.11047 ,_, + 0.816041,.-1, 

on = 0.664 + 0.03843 ,_; + 0.853022,1-1, 


and 6 = 0.475. This is a diagonal bivariate GARCH(1,1) model. The 1-step-ahead 
forecasts for VaR calculation at the forecast origin t = 2275 are 


fi = 0.373, G? = 4.287, Py = 0.222, ô? = 5.706, 6 = 0.475. 


Consequently, we have VaR; = $30,432 and VaR, = $37,195. The overall 5% 
VaR for the investor is VaR = $58,180. 
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Time-Varying Correlation Model 
Finally, we allow the correlation coefficient to evolve over time by using the 
Cholesky decomposition. The fitted model is 


ry = 0.355 + 0.0397 -1 — 0.05771 1-2 — 0.03871 1-3 + air, 
ru = 0.206 + ax, 

gii, = 0.420 + 0.091b7,_; + 0.858 117-1, 

q2i,1 = 0.123 + 0.689¢21,;-1 — 0.014a2;-1, 

22,1 = 0.080 + 0.0135 ,_; +.0.971g22,1-1, 


where bj, = aj; and by = ay; — q21,t411. The 1-step-ahead forecasts for VaR cal- 
culation at the forecast origin t = 2275 are 


îi = 0.352, f= 0.206, 81; = 4.252, gai = 0.421, n = 5.594. 


Therefore, we have ô? = 4.252, 62, = 1.791, and ô? = 6.348. The correlation 
coefficient is 6 = 0.345. Using these forecasts, we have VaR; = $30,504, VaR = 
$39,512, and the overall VaR = $57,648. 

The estimated VaR values of the three approaches are similar. The univariate 
models give the lowest VaR, whereas the constant-correlation model produces the 
highest VaR. The range of the difference is about $1100. The time-varying volatility 
model seems to produce a compromise between the two extreme models. 


10.8 MULTIVARIATE ź¢t DISTRIBUTION 


Empirical analysis indicates that the multivariate Gaussian innovations used in the 
previous sections may fail to capture the kurtosis of asset returns. In this situation, 
a multivariate Student-t distribution might be useful. There are many versions of 
the multivariate Student-t distribution. We give a simple version here for volatility 
modeling. 

A k-dimensional random vector x = (x1, ..., X) has a multivariate Student- 
t distribution with v degrees of freedom and parameters m = 0 and £ = I (the 
identity matrix) if its probability density function (pdf) is 


C[(v + k)/2] 


pesca eG ee | =1 a/n —(v+k)/2 10.41 
GTO +u xx) : (10.41) 


f(x|v) = 
where T (y) is the gamma function; see Mardia, Kent, and Bibby (1979, p. 57). The 
variance of each component x; in Eq. (10.41) is v/(v — 2), and hence we define 
€; = /(v — 2)/vx as the standardized multivariate Student-t distribution with v 
degrees of freedom. By transformation, the pdf of €, is 


C[(v +k)/2] 


—— 9-1 gl (ut k)/2 
ro Drrwye | aye eel (10.42) 


felv) = 
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For volatility modeling, we write a; = z” "6, and assume that €; follows the 
multivariate Student-t distribution in Eq. (10.42). By transformation, the pdf of 
a; is 


T[@ + k)/2] 


falv Bt) = ay PEE, 


O + (v—2) aE a OH”. 


Furthermore, if we use the Cholesky decomposition of Ł,, then the pdf of the 
transformed shock b, becomes 


l[@ +k)/2] 
f(bi|v, Ly, G,) = ee 
Irw -DET (0/2) ja 83 
kp (v-+k)/2 
x 1+(v—2) 1 —— : 
Z Eji 


where a; = L;b; and gjj is the conditional variance of bj;. Because this pdf does 
not involve any matrix inversion, the conditional-likelihood function of the data is 
easy to evaluate. 


APPENDIX: SOME REMARKS ON ESTIMATION 


The estimation of multivariate ARMA models in this chapter is done by using 
the time series program SCA of Scientific Computing Associates. The estimation 
of multivariate volatility models is done by using either the S-Plus package with 
FinMetrics or the Regression Analysis for Time Series (RATS) program or Matlab. 
Below are some run streams for estimating multivariate volatility models using the 
RATS program. A line starting with * means “comment” only. 


Estimation of the Diagonal Constant-Correlation AR(2)—GARCH(1,1) Model 
for Example 10.5 

The program includes some Ljung—Box statistics for each component and some 
fitted values for the last few observations. The data file is m-ibmspln. txt, which 
has two columns, and there are 888 observations. 


all 0 888:1 

open data m-ibmspln.txt 

data(org=obs) / rl r2 

set hl = 0.0 

set h2 = 0.0 

nonlin a0 al b1 a00 all b11 rho cl c2 pl 
frml alt = r1(t)-cl-pl*r2(t-1) 

frml a2t = r2(t)-c2 

frml gvarl = a0tal*alt(t-1) **2+b1*h1 (t-1) 
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frml gvar2 = a00+al1*a2t(t-1) **2+b11*h2 (t-1) 
frml gdet = -0.5*(log(h1(t)=gvari(t))+log(h2(t)=gvar2(t)) $ 
t+log(1.0-rho**2) ) 

frml gln = gdet(t)-0.5/(1.0-rho**2)*((alt(t)**2/h1(t)) $ 
+(a2t(t)**2/h2(t))-2*rho*alt (t) *a2t(t)/sqrt (h1(t) *h2(t))) 

smpl 3 888 

compute cl = 1,22, c2 = 0.57, pl = 0.1, rho = 0.1 

compute a0 = 3.27, al = 0.1, bl = 0.6 

compute a00 = 1.17, all = 0.13, b11 = 0.8 

maximize (method=bhhh, recursive,iterations=150) gln 

set fvl = gvarl1(t) 

set resil = alt(t)/sqrt(fvl(t) ) 

set residsgq = resil(t)*resil(t) 

* Checking standardized residuals * 

cor (qstats,number=12,span=4) resil 

* Checking squared standardized residuals * 

cor(qstats,number=12,span=4) residsq 

set fv2 = gvar2(t) 

set resi2 = a2t(t)/sqrt(fv2(t)) 

set residsg = resi2(t)*resi2(t) 

* Checking standardized residuals * 

cor(qstats,number=12,span=4) resi2 

* Checking squared standardized residuals * 

cor(qstats,number=12,span=4) residsq 

* Last few observations needed for computing forecasts * 

set shockl = alt(t) 

set shock2 = a2t(t) 

print 885 888 shock1 shock2 fvl fv2 


Estimation of the Time-Varying Coefficient Model in Example 10.5 


all 0 888:1 

open data m-ibmspln.txt 

data(org=obs) / r1 r2 

set h1 = 45.0 

set h2 = 31.0 

set rho = 0.8 

nonlin a0 al b1 f1 a00 all b11 d11. £11 cl c2 pl p3 q0 ql g2 

frml alt = r1(t)-cl-pl*r1(t-1)-p3*r2(t-2) 

frml a2t = r2(t)-c2 

frml gvarl = a0+al*alt(t-1) **2+b1*h1 (t-1)+f£1*h2(t-1) 

frml gvar2 = a00+all*a2t (t-1)**2+b11*h2(t-1)+f£11*h1(t-1) $ 
+d11*alt (t-1) **2 

frml rhl = q0 + qil*rho(t-1) $ 

+ q2*alt (t-1) *a2t(t-1)/sqrt (hl (t-1) *h2(t-1) ) 

frml rh = exp(rhl1(t))/(1l+exp(rhi(t))) 

frml gdet = -0.5*(log(h1(t)=gvari(t))+log(h2(t)=gvar2(t)) $ 

tlog(1.0-(rho(t)=rh(t))**2) ) 
frml gln = gdet(t)-0.5/(1.0-rho(t) **2)*((alt(t)**2/h1(t)) $ 
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+(a2t(t)**2/h2(t))-2*rho(t) *alt(t) *a2t(t) /sqrt (h1(t) *h2(t)) ) 
smpl 4 888 


compute cl = 1.4, c2 = 0.7, pl = 0.1, p3 = -0.1 
compute a0 = 2.95, al = 0.08, bL = 0.87, EL = -.03 
compute a00 = 2.05, all = 0.05 

compute b11 = 0.92, £11=-.06, d11=.04, q0 = -2.0 


compute ql = 3.0, q2 = 0.1 

nlpar (criterion=value,cvcrit=0.00001) 
maximize (method=bhhh, recursive, iterations=150) gln 
set fvl = gvarl1(t) 

set resil = alt(t)/sqrt(fvl1(t) ) 

set residsq = resil(t)*resil(t) 

* Checking standardized residuals * 

cor (qstats,number=16,span=4) resil 

* Checking squared standardized residuals * 
cor(qstats,number=16,span=4) residsq 

set fv2 = gvar2(t) 

set resi2 = a2t(t)/sqrt(fv2(t) ) 

set residsgq = resi2(t)*resi2(t) 

* Checking standardized residuals * 

cor (qstats,number=16,span=4) resi2 

* Checking squared standardized residuals * 
cor(qstats,number=16,span=4) residsq 

* Last few observations needed for computing forecasts * 
set rhohat rho (t) 

set shockl = alt(t) 

set shock2 = a2t(t) 

print 885 888 shock1 shock2 fvl fv2 rhohat 


Estimation of the Time-Varying Coefficient Model in Example 10.5 Using 
Cholesky Decomposition 


all 0 888:1 

open data m-ibmspln.txt 

data(org=obs) / rl r2 

set hl = 45.0 

set h2 = 20.0 

set q = 0.8 

nonlin a0 al bl a00 all b11 dill £11 cl c2 pl p3 t0 tl t2 
frml alt = r1(t)-cl-pil*ri1(t-1)-p3*r2(t-2) 

frml a2t = r2(t)-c2 

frml vl = a0tal*alt(t-1)**2+b1*h1 (t-1) 

frml qt = t0 + tl*q(t-1) + t2*a2t(t-1) 

frml bt = a2t(t) - (q(t)=qt(t)) *alt(t) 

frml v2 = a00+a11*bt (t-1) **2+b11*h2 (t-1)+£11*h1(t-1) $ 
+d11*ait(t-1)**2 

frml gdet = -0.5*(log(hl1l(t) = vi(t))+ log(h2(t)=v2(t))) 
frml garchin = gdet-0.5*(alt(t) **2/h1(t)+bt(t) **2/h2 (t) ) 
smpl 5 888 
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compute cl = 1.4, c2 = 
compute a0 = 1.0, al = 
compute a00 = 2.0, all 
compute d11=.04, £11=-.06, tO =0.2, tl = 0.1, t2 = 0.1 
nlpar (criterion=value, cvcrit=0.00001) 

maximize (method=bhhh, recursive,iterations=150) garchin 
set fvl = v1(t) 

set resil = alt(t)/sqrt(fvl1(t) ) 

set residsq = resil(t)*resil(t) 

* Checking standardized residuals * 

cor (qstats,number=16,span=4) resil 

* Checking squared standardized residuals * 
cor(qstats,number=16,span=4) residsq 

set fv2 = v2(t)+qt(t) **2*v1(t) 

set resi2 = a2t(t)/sqrt(fv2(t)) 

set residsgq = resi2(t)*resi2(t) 

* Checking standardized residuals * 

cor (qstats,number=16,span=4) resi2 

* Checking squared standardized residuals * 

cor (qstats,number=16,span=4) residsq 

* Last few observations needed for forecasts * 


set rhohat = qt(t)*sqrt(vl1(t) /fv2(t) ) 
set shockl = alt(t) 
set shock2 a2t(t) 


set g22 = v2(t) 
set q21 = qt(t) 
set b2t bt (t) 
print 885 888 shockl shock2 fvl fv2 rhohat g22 q21 b2t 


Estimation of Three-Dimensional Time-Varying Correlation Volatility Model in 
Example 10.7 Using Cholesky Decomposition 
Initial estimates are obtained by a sequential modeling procedure. 


all 0 2275<:1 

open data d-cscointc.txt 

data(org=obs) / r1 r2 r3 

set h1 = 1.0 

set h2 = 4.0 

set h3 = 3.0 

set q21 = 0 

set q31 = 0. 

set q32 = 0. 

nonlin cl c2 c3 p3 p21 p22 p31 p33 a0 al a2 t0 ti t2 b0 b1 $ 
b2 u0 ul u2 w0 wl w2 d0 d1 d2 d5 

frml alt = r1(t)-cl-p3*r1(t-3) 

frml a2t = r2(t)-c2-p21*r1(t-2)-p22*r2(t-2) 

frml a3t = r3(t)-c3-p31*r1(t-1) -p33*r3 (t-1) 

frml vl = aO+al*alt(t-1)**2+a2*h1 (t-1) 


WW oœ 
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git = 
bt 
v2 
q2t 
q3t 
bit = 
v3 = 
gdet 


smpl 8 227 
compute cl 
compute p2 
compute a0 
compute t0 
compute b0 


garchln = 
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tO + t1*q21(t=-1) 
a2t(t) - 


+ t2*a2t(t-1) 
(q21(t)=qlt(t))*alt(t) 


= bO+b1*bt (t-1) **2+b2*h2 (t-1) 


u0 + ul*q31(t-1) + u2*a3t(t-1) 
w0 + wl*q32(t-1) + w2*a2t(t-1) 
a3t(t)-(q31(t)=q2t(t) ) *alt(t)-(q32(t)=q3t(t)) *bt(t) 
d0+d1*b1t (t-1) **2+d2*h3 (t-1) +d5*h2 (t-1) 
= -0.5*(log(hl1(t) = vl(t))+ log(h2(t)=v2(t)) $ 

+log (h3 (t) =v3 (t) )) 

gdet-0.5* (alt (t) **2/h1(t)+bt(t)**2/h2(t) $ 

+bit (t) **2/h3 (t) ) 
5 
= 0.07, 
1 2022, 
= Ol, 


G2 = 
p22 


0.33%, -0.04 
sO 1; 
05, 
.82, 


.08, 


e3 = 0.19, pl = 0.1; 
p31 = -0.26, p33 = 
a2 = 0.94 
t2 -0.035 
b2 = 0.89 


p3 = 
0.06 


= .17, bl = 


ul = 


compute u0= 0.04, «O73 2. =: 10.04 
compute w0 =0.006, .98, w2=0.004 
compute dO =1.38, dl = 0.06, d2 = 0.64, 
nlpar (criterion=value,cvcrit=0.00001) 
maximize (method=bhhh, recursive, iterations=250) 
set fvl = v1(t) 

set resil = alt(t)/sqrt(fvl1(t) ) 

set residsgq = resil(t)*resil(t) 

* Checking standardized residuals * 

cor (qstats,number=12,span=4) resil 

* Checking squared standardized residuals * 
cor(qstats,number=12,span=4) residsq 

set fv2 = v2(t)+qlt(t)**2*v1(t) 

set resi2 = a2t(t)/sqrt(fv2(t) ) 

set residsgq = resi2(t)*resi2(t) 

* Checking standardized residuals * 

cor (qstats,number=12,span=4) resi2 

* Checking squared standardized residuals * 


d5 = -0.027 


garchin 


cor (qstats,number=12 
set fv3 = v3(t)+q2t( 
set resi3 = 
set residsg = resi3 ( 


,span=4) residsq 
t) **2¥*v1(t)+q3t (t) **2*v2 (t) 


a3t(t)/sqrt (fv3(t) ) 


t) *resi3(t) 


* Checking standardized residuals * 


cor (qstats,number=12,span=4) resi3 
* Checking squared standardized residuals * 
cor(qstats,number=12,span=4) residsq 


* print. st 
set rho21 
set rho31 
set rho32 


print 10 


andardized residuals and correlation-coefficients 
= qlt(t) *sqrt(v1(t)/fv2(t)) 
= g2t(t) *sqrt (v1 (t)/fv3(t)) 
= (q2t(t)*qlt(t)*v1i(t) $ 
+q3t(t) *v2(t))/sqrt (fv2 (t) *fv3(t) ) 
2275 resil resi2 resi3 
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print 10 2275 rho21 rho31 rho32 
print 10 2275 fvl fv2 fv3 


EXERCISES 


10.1. 


10.2. 


10.3. 


10.4. 


10.5. 


10.6. 


10.7. 


10.8. 


Consider the monthly simple returns, including dividends, of IBM stock, 
Hewlett-Packard (HPQ) stock, and the S&P composite index from January 
1962 to December 2008 for 564 observations. The returns are in the file 
m-ibmhpqsp6208.txt. Transform into log returns in percentages. Use the 
exponentially weighted moving-average method to obtain a multivariate 
volatility series for the three return series. What is the estimated A? Plot the 
three volatility series. 

Focus on the monthly log returns of IBM and HPQ stocks from January 
1962 to December 2008. Fit a DVEC(1,1) model to the bivariate return 
series. Is the model adequate? Plot the fitted volatility series and the time- 
varying correlations. 

Focus on the monthly log returns of the S&P composite index and HPQ 
stock. Build a BEKK model for the bivariate series. What is the fitted 
model? Plot the fitted volatility series and the time-varying correlations. 


Build a constant-correlation volatility model for the three monthly log 
returns of IBM stock, HPQ stock, and S&P composite index. Write down 
the fitted model. Is the model adequate? Why? 

The file m-geibmsp2608.txt contains the monthly simple returns of Gen- 
eral Electric stock, IBM stock, and the S&P composite index from January 
1926 to December 2008. The returns include dividends. Transform into log 
returns in percentages. Focus on the monthly log returns in percentages of 
GE stock and the S&P 500 index. Build a constant-correlation GARCH 
model for the bivariate series. Check the adequacy of the fitted model, and 
obtain the 1|-step-ahead forecast of the covariance matrix at the forecast 
origin December 2008. 


Again, consider the monthly log returns of GE, IBM, and S&P composite 
index from January 1926 to December 2008. Build a dynamic correla- 
tion model for the three-dimensional series. For simplicity, use the sample 
correlation matrix for p in Eq. (10.32). 

The file m-spibmge.txt contains the monthly log returns in percentages 
of the S&P composite index, IBM stock, and GE stock from January 1926 
to December 1999. Focus on GE stock and the S&P 500 index. Build a 
time-varying correlation GARCH model for the bivariate series using a 
logistic function for the correlation coefficient. Check the adequacy of the 
fitted model, and obtain the 1-step-ahead forecast of the covariance matrix 
at the forecast origin December 1999. 


Focus on the monthly log returns in percentages of GE stock and the S&P 
500 index from January 1926 to December 1999. Build a time-varying 
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correlation GARCH model for the bivariate series using the Cholesky 
decomposition. Check the adequacy of the fitted model, and obtain the 
1-step-ahead forecast of the covariance matrix at the forecast origin 
December 1999. Compare the model with the other model built in the 
previous exercise. 

10.9. Consider the three-dimensional return series of the previous exercise jointly. 
Build a multivariate time-varying volatility model for the data, using the 
Cholesky decomposition. Discuss the implications of the model and com- 
pute the 1-step-ahead volatility forecast at the forecast origin tf = 888. 

10.10. An investor is interested in daily value at risk of his position on holding 
long $0.5 million of Dell stock and $1 million of Cisco Systems stock. 
Use 5% critical values and the daily log returns from February 20, 1990, 
to December 31, 1999, to do the calculation. The data are in the file d- 
dellcsco9099.txt. Apply the three approaches to volatility modeling in 
Section 10.7 and compare the results. 
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CHAPTER |l! 


State-Space Models 
and Kalman Filter 


The state-space model provides a flexible approach to time series analysis, espe- 
cially for simplifying maximum-likelihood estimation and handling missing values. 
In this chapter, we discuss the relationship between the state-space model and the 
ARIMA model, the Kalman filter algorithm, various smoothing methods, and some 
applications. We begin with a simple model that shows the basic ideas of the state- 
space approach to time series analysis before introducing the general state-space 
model. For demonstrations, we use the model to analyze realized volatility series of 
asset returns, the time-varying coefficient market models, and the quarterly earnings 
per share of a company. 

There are many books on statistical analysis using the state-space model. Durbin 
and Koopman (2001) provide a recent treatment of the approach, Kim and Nelson 
(1999) focus on economic applications and regime switching, and Anderson and 
Moore (1979) give a nice summary of theory and applications of the approach for 
engineering and optimal control. Many time series textbooks include the Kalman 
filter and state-space model. For example, Chan (2002), Shumway and Stoffer 
(2000), Hamilton (1994), and Harvey (1993) all have chapters on the topic. West 
and Harrison (1997) provide a Bayesian treatment with emphasis on forecasting, 
and Kitagawa and Gersch (1996) use a smoothing prior approach. 

The derivation of Kalman filter and smoothing algorithms necessarily involves 
heavy notation. Therefore, Section 11.4 could be dry for readers who are interested 
mainly in the concept and applications of state-space models and can be skipped 
on the first read. 
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11.1 LOCAL TREND MODEL 


Consider the univariate time series y; satisfying 


Y= Mren — e~ NO, 02), (11.1) 
Mii =iet+m, 1 ~ NO, 0f), (11.2) 


where {e;} and {n,} are two independent Gaussian white noise series and t = 
1,..., T. The initial value mı is either given or follows a known distribution, and 
it is independent of {e;} and {n,} for t >0. Here u; is a pure random walk of 
Chapter 2 with initial value u1, and y; is an observed version of u; with added 
noise e;. In the literature, jz; is referred to as the trend of the series, which is not 
directly observable, and y; is the observed data with observational noise e;. The 
dynamic dependence of y; is governed by that of u; because {e;} is not serially 
correlated. 

The model in Eqs. (11.1) and (11.2) can readily be used to analyze realized 
volatility of an asset price; see Example 11.1. Here u, represents the underlying 
log volatility of the asset price and y; is the logarithm of realized volatility. The 
true log volatility is not directly observed but evolves over time according to a 
random-walk model. On the other hand, y; is constructed from high-frequency 
transactions data and subjected to the influence of market microstructure noises. 
The standard deviation of e, denotes the scale used to measure the impact of market 
microstructure noises. 

The model in Eqs. (11.1) and (11.2) is a special linear Gaussian state-space 
model. The variable u, is called the state of the system at time ¢ and is 
not directly observed. Equation (11.1) provides the link between the data y, 
and the state u, and is called the observation equation with measurement 
error e;. Equation (11.2) governs the time evolution of the state variable 
and is the state equation (or state transition equation) with innovation n+. 
The model is also called a local-level model in Durbin and Koopman (2001, 
Chapter 2), which is a simple case of the structural time series model of 
Harvey (1993). 


Relationship to ARIMA Model 

If there is no measurement error in Eq. (11.1), that is, oe = 0, then y; = us, which 
is an ARIMA(0,1,0) model. If oe > 0, that is, there exist measurement errors, then 
y, is an ARIMA(0,1,1) model satisfying 


(1 — B)y, = (1 — OB)a,, (11.3) 


where {a;} is a Gaussian white noise with mean zero and variance oĉ. The values 
of @ and o, are determined by oe and o,. This result can be derived as follows. 
From Eq. (11.2), we have 


(L — B) hiyi = or Ht+1 = 
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Using this result, Eq. (11.1) can be written as 


1 
y= Tg" Fer 


Multiplying by (1 — B), we have 
(1 — B)yi = -1 + €r — er-1. 


Let (1 — B)y, = w;. We have w; = n:—1 + er — e;-1. Under the model assump- 
tions, it is easy to see that (a) w; is Gaussian, (b) Var(w;) = 20? +o}, (c) 
Cov(w;, W;-1) = —o2, and (d) Cov(w;, w;—;) =0 for j > 1. Consequently, w 
follows an MA(1) model and can be written as w, = (1 — 0 B)a;,. By equating 
the variance and lag-1 autocovariance of w, = (1 — 0B)a; = mr—1 + ey — er-1, We 


have 


(1+ 67)o? = 207 +o}, (11.4) 
bo? = o. (11.5) 


For given o2 and Sz: one considers the ratio of the prior two equations to form a 


quadratic function of 0. This quadratic form has two solutions so one should select 
the one that satisfies |9| < 1. The value of o? can then be easily obtained. Thus, 
the state-space model in Eqs. (11.1) and (11.2) is also an ARIMA(0,1,1) model, 
which is the simple exponential smoothing model of Chapter 2. 

On the other hand, for an ARIMA(0,1,1) model with positive 0, one can use 
the prior two identities to solve for o and oF. and obtain a local trend model. 
If 6 is negative, then the model can still be put in a state-space form without 
the observational error, that is, ce = 0. In fact, as will be seen later, an ARIMA 
model can be transformed into state-space models in many ways. Thus, the linear 
state-space model is closely related to the ARIMA model. 

In practice, what one observes is the y, series. Thus, based on the data alone, 
the decision of using ARIMA models or linear state-space models is not critical. 
Both model representations have pros and cons. The objective of data analy- 
sis, substantive issues, and experience all play a role in choosing a statistical 
model. 


Example 11.1. To illustrate the ideas of the state-space model and Kalman 
filter, we consider the intradaily realized volatility of Alcoa stock from January 2, 
2003, to May 7, 2004, for 340 observations. The daily realized volatility used is 
the sum of squares of intraday 10-minute log returns measured in percentage. No 
overnight returns or the first 10-minute intraday returns are used. See Chapter 3 for 
more information about realized volatility. The series used in the demonstration is 
the logarithm of the daily realized volatility. 

Figure 11.1 shows the time plot of the logarithms of the realized volatility of 
Alcoa stock from January 2, 2003, to May 7, 2004. The transactions data are 
obtained from the TAQ database of the NYSE. If ARIMA models are entertained, 
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Realized volatility 
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Figure 11.1 Time plot of logarithms of intradaily realized volatility of Alcoa stock from January 2, 


2003, to May 7, 2004. Realized volatility is computed from intraday 10-minute log returns measured 
in percentage. 


we obtain an ARIMA(0,1,1) model 
(1 — B)y, = (1 — 0.858 B)a,, õa = 0.5184, (11.6) 


where y; is the log realized volatility, and the standard error of 6 is 0.028. The 
residuals show Q(12) = 12.4 with a p value of 0.33, indicating that there is 
no significant serial correlation in the residuals. Similarly, the squared residuals 
give Q(12) = 8.2 with a p value of 0.77, suggesting no ARCH effects in the 
series. 

Since 6 is positive, we can transform the ARIMA(0,1,1) model into a local 
trend model in Eqs. (11.1) and (11.2). The maximum-likelihood estimates (MLE) 
of the two parameters are 6, = 0.0735 and ôe = 0.4803. The measurement errors 
have a larger variance than the state innovations, confirming that intraday high- 
frequency returns are subject to measurement errors. Details of estimation will be 
discussed in Section 11.1.7. Here we treat the two estimates as given and use the 
model to demonstrate application of the Kalman filter. Note that using the model 
in Eq. (11.6) and the relation in Eqs. (11.4) and (11.5), we obtain oe = 0.480 and 
0, = 0.0736. These values are close to the MLE shown above. 
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11.1.1 Statistical Inference 


Return to the state-space model in Eqs. (11.1) and (11.2). The aim of the analysis 
is to infer properties of the state u; from the data {y,|f = 1,..., T} and the model. 
Three types of inference are commonly discussed in the literature. They are filter- 
ing, prediction, and smoothing. Let F; = {y1,..., ye} be the information available 
at time f (inclusive) and assume that the model is known, including all parameters. 
The three types of inference can briefly be described as follows: 


e Filtering. Filtering means to recover the state variable u; given F;, that is, 
to remove the measurement errors from the data. 


e Prediction. Prediction means to forecast W;+_ Or yr+n for h>O given F;, 
where ¢ is the forecast origin. 


e Smoothing. Smoothing is to estimate u, given Fr, where T >t. 


A simple analogy of the three types of inference is reading a handwritten note. 
Filtering is figuring out the word you are reading based on knowledge accumulated 
from the beginning of the note, predicting is to guess the next word, and smoothing 
is deciphering a particular word once you have read through the note. 

To describe the inference more precisely, we introduce some notation. Let 
Utj = E(u|F;) and Xaj = Var(y;|F;) be, respectively, the conditional mean and 
variance of u; given Fj. Similarly, yz; denotes the conditional mean of y; given 
Fj. Furthermore, let v; = y; — yyr—-1 and V; = Var(v;|F;—1) be the 1-step-ahead 
forecast error and its variance of y; given F;—1. Note that the forecast error v; is 
independent of F;_; so that the conditional variance is the same as the unconditional 
variance; that is, Var(v;|F;—-1) = Var(v;). From Eq. (11.1), 


Yet = EQOA) = Ee + erl F;—1) = E(u Fi) = Mart 
Consequently, 
Ur = Yt — Ytlt—1 = Yt — Me|t-1 (11.7) 
and 


V; = Var(yr — Hij-1|Fr-1) = Var(eer + er — Hit-1|Fr-1) 
= Var(My — Milt—1|Fr—-1) + Var(e;|Fi-1) = Ert-1 + Ge. (11.8) 


It is also easy to see that 


E(v,;) = ELE(u;|F-1)] = ELE (y: — Yet—1 1-1] = E[Yrt-1 = Yrlt—-1] = 0, 
Cov(v;, yj) = E (vyj) = ELE ry |Fi-1)] = Ely; E (v| F:-1)] = 9, j<t. 
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Thus, as expected, the 1-step-ahead forecast error is uncorrelated (hence, indepen- 
dent) with y; for j < t. Furthermore, for the linear model in Eqs. (11.1) and (11.2), 
Mit = E (p| F) = EQ l|Fi-1, v) and Ly, = Var(u| Fi) = Var(ur|Fi—1, ve). In 
other words, the information set F; can be written as F; = {Fi-1, yr} = {Fy-1, vy}. 

The following properties of multivariate normal distribution are useful in study- 
ing the Kalman filter under normality. They can be shown via the multivariate linear 
regression method or factorization of the joint density. See, also, Appendix B of 
Chapter 8. For random vectors w and m, denote the mean vectors and covariance 
matrix as E(w) = p „, E(m) = p,,, and Cov(m, w) = È mw, respectively. 


Theorem 11.1. Suppose that x, y, and z are three random vectors such that 
their joint distribution is multivariate normal. In addition, assume that the diag- 
onal block covariance matrix XZ» is nonsingular for w = x, y, z, and Ly, = 0. 
Then, 


1. E@|y) = #, + Ery Ez O — by). 

2. Var(x|y) = Dag Dag BG, Bye: 

3. E(xly, z) = E(@wly) + Ex: £7 — H). 
4. Var(x|y,z) = Var(x|y) — Ex: E7 Lex. 


11.1.2 Kalman Filter 


The goal of the Kalman filter is to update knowledge of the state variable recur- 
sively when a new data point becomes available. That is, knowing the conditional 
distribution of u, given F;_; and the new data y,, we would like to obtain the con- 
ditional distribution of u, given F;, where, as before, F; = {y1,..., yj}. Since F; 
= {F;_1, vt}, giving y; and F;_, is equivalent to giving v; and F;_;. Consequently, 
to derive the Kalman filter, it suffices to consider the joint conditional distribution 
of (ur, vi) given F,_, before applying Theorem 11.1. 

The conditional distribution of v, given F;~; is normal with mean zero and 
variance given in Eq. (11.8), and that of u, given F;_; is also normal with mean 
Hr—1 and variance %,),_;. Furthermore, the joint distribution of (j;, v) given 
F,_, is also normal. Thus, what remains to be solved is the conditional covariance 
between u, and v; given F;_;. From the definition, 


Cov(ur, v| Fi1) = E (mrvi Fr-1) = Elh Or — Hii) F1] [by Eq. (11.7)] 
= Ely (ue + er — Hi-1)|Fi—1] 
= Eli (ue — Hit-1)|Fi-1] + Eurer F1) 
= E[ (ue — bepe—1) Fei] = Var(uelFe-1) = Ere- (11.9) 


where we have used the fact that E[pr—1 (ur — Mryr—1)|F-1] = 0. Putting the 
results together, we have 
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Ht ~N Mt\t-1 Xit- Ère- 
Ur Ft 0 ; Xiri V, ` 


By Theorem 11.1, the conditional distribution of jz; given F, is normal with mean 
and variance 


t\t—1Ur 


x 
Melt = Meri + V = Mrt-1 + Kiv, (11.10) 
t 


yd 
Diri 


Xr _ Xr = 


= Xar- (1 — Ky), (11.11) 


where K, = X;-1/ V; is commonly referred to as the Kalman gain, which is the 
regression coefficient of u; on v;. From Eq. (11.10), Kalman gain is the factor that 
governs the contribution of the new shock v; to the state variable uz. 

Next, one can make use of the knowledge of u, given F, to predict j;41 via 
Eq. (11.2). Specifically, we have 


Uii = E (tr + EF) = ECM Fi) = hirt, (11.12) 
Epi = Var(pipi |F) = Varu | Fy) + Var) = Eie +0, (11.13) 


Once the new data y;+1 is observed, one can repeat the above procedure to update 
knowledge of j1,4. This is the famous Kalman filter algorithm proposed by Kalman 
(1960). 

In summary, putting Eqs. (11.7) and (11.13) together and conditioning on the 
initial assumption that u; is distributed as N (u1jo, X1j0), the Kalman filter for the 
local trend model is as follows: 


Ur = Yt — Mt|t—-15 

V: = Xr- + ož, 

Kı = Xit-1/ Vr, (11.14) 
Ut+ijt = Hit-1 + Krvy,, 
Drie = Ei — K) +97, t=1,...,T. 


There are many ways to derive the Kalman filter. We use Theorem 11.1, which 
describes some properties of multivariate normal distribution, for its simplicity. In 
practice, the choice of initial values X1jọ and j11\9 requires some attention and we 
shall discuss it later in Section 11.1.6. For the local trend model in Eqs. (11.1) and 
(11.2), the two parameters o, and o, can be estimated via the maximum-likelihood 
method. Again, the Kalman filter is useful in evaluating the likelihood function of 
the data in estimation. We shall discuss estimation in Section 11.1.7. 
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Figure 11.2 Time plots of output of Kalman filter applied to daily realized log volatility of Alcoa 
stock based on local trend state-space model: (a) filtered state y; and (b) 1-step-ahead forecast error vy. 


Example 11.1 (Continued). To illustrate application of the Kalman filter, we 
use the fitted state-space model for daily realized volatility of Alcoa stock returns 
and apply the Kalman filter algorithm to the data with X4)9 = oo and j24)9 = 0. The 
choice of these initial values will be discussed in Section 11.1.6. Figure 11.2(a) 
shows the time plot of the filtered state variable u+, and Figure 11.2(b) is the time 
plot of the 1-step-ahead forecast error v,. Compared with Figure 11.1, the filtered 
states are smoother. The forecast errors appear to be stable and center around zero. 
These forecast errors are out-of-sample 1-step-ahead prediction errors. 


11.1.3 Properties of Forecast Error 


The 1-step-ahead forecast errors {v+} are useful in many applications, hence it pays 
to study carefully their properties. Given the initial values Xıjọ and /24\9, which 
are independent of y;, the Kalman filter enables us to compute v; recursively as a 
linear function of {y;,..., yr}. Specifically, by repeated substitutions, 


vı = y1 — HI 0, 


v2 = y2 — M21 = y2 — Myo — Kı (Yı — Hy), 


v3 = y3 — H32 = y3 — Hijo — K2(y2 — Hijo) — Ki. — K2)(1 — Hijo), 
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and so on. This transformation can be written in matrix form as 
v = K(y— wilr), (11.15) 


where v = (vj,..., Ur)’, y= (y1,---, yr)’, Lr is the T-dimensional vector of 
ones, and K is a lower triangular matrix defined as 


1 0 0 0 

i 1 0 0 
K—|ki k2 1 oj. 

kr, kro kr3 œ> 1 


where ki i—1 = — Ki and kij = —(1 = Ki) = K;_2) wae (d = K j41) Kj fori = 
2,...,7 and j =1,...,i—2. It should be noted that, from the definition, the 
Kalman gain K, does not depend on ujo or the data {y),..., y}; it depends on 
Dyjo and o? and a. 

The transformation in Eq. (11.5) has several important implications. First, {vz} 
are mutually independent under the normality assumption. To show this, consider 
the joint probability density function of the data 


T 
Pi... YT) = PO) I] p(y; |Fj-1). 
j=? 


Equation (11.15) indicates that the transformation from y; to v; has a unit Jacobian 
so that p(v) = p(y). Furthermore, since j11\9 is given, p(vı) = p(y1). Conse- 
quently, the joint probability density function of v is 


T: T T 
pr) = p(y) = pow | [20E = ped | [ro = [ [ p@). 


j=2 j j=l 


This shows that {v,} are mutually independent. 

Second, the Kalman filter provides a Cholesky decomposition of the covariance 
matrix of y. To see this, let @ = Cov(y). Equation (11.15) shows that Cov(v) = 
KQK’. On the other hand, {v,} are mutually independent with Var(v;) = V;. There- 
fore, KQK' = diag{V,,..., Vr}, which is precisely a Cholesky decomposition of 
Q. The elements k;; of the matrix K thus have some nice interpretations; see 
Chapter 10. 


State Error Recursion 
Turn to the estimation error of the state variable jz,. Define 


Xt = Ht — Mtjt-1 
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as the forecast error of the state variable u, given data F;_;. From Section 11.1.1, 
Var(x;|F;—1) = X1. From the Kalman filter in Eq. (11.14), 


Up = Yt — Meet = Ht + Cr — Hit-1 = Xi + er, 
and 


Xt+1 = Met T Metis = Me H Me — (Ht\r—-1 + Kv) 
=x, +m — Kru, = x; + yp — Kr + er) = Lex; +m — Kren 


where L; = 1 — K; = 1 — Em-1/ V= (Vi — Sy) /Vi = o2/V;,. Consequently, 
for the state errors, we have 


Uy = Xt + er, Xt+1 = Lyx, +m — Kter, PS loat; (11.16) 


where xı = (41 — H1jo. Equation (11.16) is in the form of a time-varying state-space 
model with x; being the state variable and v; the observation. 


11.1.4 State Smoothing 


Next we consider the estimation of the state variables {u1, ..., yr} given the data 
Fr and the model. That is, given the state-space model in Eqs. (11.1) and (11.2), 
we wish to obtain the conditional distribution jz;|F7 for all t. To this end, we first 
recall some facts available about the model: 


e All distributions involved are normal so that we can write the conditional 
distribution of jz; given Fr as N (unr, Zar), where t < T. We refer to Mrr 
as the smoothed state at time t and &,)7 as the smoothed state variance. 


e Based on the properties of {v,} shown in Section 11.1.3, {v,,..., vr} are 
mutually independent and are linear functions of {y1,..., yr}. 


e If y1,..., yr are fixed, then F;_; and {v;,..., vr} are fixed, and vice versa. 


e {v;,..., Ur} are independent of F;_; with mean zero and variance Var(v;) = 
V; for j = t. 


Applying Theorem 11.1(3) to the conditional joint distribution of (mr, 
Ut, ..., Ur) given F;_;, we have 


Myr = E(u | Fr) = E(u |Fi-1, vt... vr) 


= E(u;|Fi-1) + Cov[us, (ur, ..., vr) ICov[(u, ..., ur) (un, ---, vr)’ 


Cov v) Tv 0 - 0J] fw 


Cov(Hr, Ur41) 0 Vya we 0 Ur+1 
= Mryr-1 + ‘ r $ 7 i 


Cov(us, vr) 0 0 >- Vr Ur 
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T 
= pyy-1 + X Cov(uy, vj)V; vj. (11.17) 


j=t 


From the definition and independence of {v;}, Cov(u;, vj) = Cov(x;, vj) for j = 
fosas ky and: 


Cov(x;, vi) = Ex; (x; + e;)] = Var(x,) = Ler-15 
Cov(x;, v41) = Ele (ri + er41)] = Ele (Lix +m — Kier)] = Xr- Ly. 


Similarly, we have 


Cov(x;, U42) = E[x; (4142 + er42)] = + = Yyr—-1L Lr41, 
T-1 
Cov(x;, vr) = E[x;(xr + er)) = +++ = Err- I] Lj. 
j=t 


Consequently, Eq. (11.17) becomes 


Ur Ur41 Up4.2 
C= -1 -1 -1 -1 +1 es 
Ht] Utt-1 + Er V, F Xir L, V =F Lilt LL; Vics + 


= Mtt-1 + Xir 19r-1; 


where 
v v v = v 
t t+1 t+2 T 
y= +L +L,L pere BES — 11.18 
qdt—1 V, t Vai tht+1 Vis a j Ve ( ) 
is a weighted linear combination of the innovations {v;,..., ur}. This weighted 
sum satisfies 
v v v — v 
t t+1 t+2 T 
-1 =— +L +L +--+ 1 ne 
men | Ya, a I Lals 
j=t+1 
Ur 
=—+Ligt 


Therefore, using the initial value gr = 0, we have the backward recursion 


Ur 


+ Lid, PS TSE = hrsg (11.19) 
V; 


qt-1 = 


Putting Eqs. (11.17) and (11.19) together, we have a backward recursive algorithm 
to compute the smoothed state variables: 
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od Voy + Lids; UiT = Melt—1 + Vae—191-1, a ee 
(11.20) 


where gr = 0, and py;-1, Xr-1 and L, are available from the Kalman filter in 
Eq. (11.14). 


Smoothed State Variance 
The variance of the smoothed state variable j1;;7 can be derived in a similar manner 
via Theorem 11.1(4). Specifically, letting vT = (v;,..., vr)’, we have 


Xar = Var(u| Fr) = Var(ur|Fi-1, Ur, ---, vr) 


= Var(ur| F1) — Cov[ qr, (v} )'}Cov[(v} )]-'Covipr, (v7 )] 
T 
= Xii — X [Cov(u, vj)? V7, (11.21) 


jet 


where Cov(j4;, vj) = Cov(x;, vj) are given earlier after Eq. (11.17). Thus, 


T-1 
1 1 1 
Bar = Eai — Eao- Bo TT |S 
t| tlt tlt t t|t—1 "Vai t\t—1 oer i} Ve 
= Lyi — Lay Mia, (11.22) 
where 
T-1 
1 1 4 1 1 
Mii = > + L?—— + LL ,—— + + L |= 
TV Ve EA lI i| Vr 


is a weighted linear combination of the inverses of variances of the 1-step-ahead 
forecast errors after time t — 1. Let Mr = 0 because no 1-step-ahead forecast error 
is available after time index T. The statistic M;_; can be written as 


1 1 1 
Mi = > +L | — +L? +--+] I él 
t—1 V, t Vai tly : J Vr 
j=t+1 
1 2 
=— 40M, £27,7T —1)..0)1: 
V; 


1 1 1 
Var(qi-1) = > y tuyt -+ Jg — = Mı. 
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Figure 11.3 Filtered state variable ju,;, and its 95% pointwise confidence interval for daily log realized 
volatility of Alcoa stock returns based on fitted local-trend state-space model. 


Combining the results, variances of the smoothed state variables can be computed 
efficiently via the backward recursion 


Mi =Vyo'+L7M,, 9 Ear = E-i — Bii Mo = t = T,... 1, 
(11.23) 


where Mr = 0. 


Example 11.1 (Continued). Applying the Kalman filter and state-smoothing 
algorithms in Eqs. (11.20) and (11.23) to the daily realized volatility of Alcoa stock 
using the fitted state-space model, we can easily compute the filtered state 4, and 
the smoothed state ur and their variances. Figure 11.3 shows the filtered state 
variable and its 95% pointwise confidence interval, whereas Figure 11.4 provides 
the time plot of smoothed state variable and its 95% pointwise confidence interval. 
As expected, the smoothed state variables are smoother than the filtered state vari- 
ables. The confidence intervals for the smoothed state variables are also narrower 
than those of the filtered state variables. Note that the width of the 95% confidence 
interval of j41;; depends on the initial value Xj). 
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Figure 11.4 Smoothed state variable 4r and its 95% pointwise confidence interval for daily log 
realized volatility of Alcoa stock returns based on fitted local-trend state-space model. 


11.1.5 Missing Values 


An advantage of the state-space model is in handling missing values. Suppose 

that the observations { yla +1 are missing, where h > 1 and 1 < £ < T. There 

are several ways to handle missing values in state-space formulation. Here we 

discuss a method that keeps the original time scale and model form. For t € {£ + 

| ere +h}, we can use Eq. (11.2) to express jz; as a linear combination of pe+1 
j= 


and {nj} e1 Specifically, 
t-1 
Ut = Mri +H SH = Meg + > Nj 


j=t+l 


where it is understood that the summation term is zero if its lower limit is greater 
than its upper limit. Therefore, for t € {L + 1,...,€+h}, 


E(ur|Fr-1) = E (u| Fe) = Heije, 
Var (| Fi—1) = Var(uil Fe) = Eepe + (¢ — &— Doz. 


Consequently, we have 


Utt-1 = Mr-1|t-2; Xir-1 = Eri- + Oy: (11.24) 
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fort =€+2,...,€+h. These results show that we can continue to apply the 
Kalman filter algorithm in Eq. (11.14) by taking v; = 0 and Ķ, = 0 for t = £ + 
1,...,&+h. This is rather natural because when y; is missing, there is no new 
innovation or new Kalman gain so that v; = 0 and K; = 0. 


11.1.6 Effect of Initialization 


In this section, we consider the effects of initial condition yı ~ N(j11\0, X4\0) on 
the Kalman filter and state smoothing. From the Kalman filter in Eq. (11.14), 


vı = yı — M110; Vi = Xio + oå, 


and, by Eqs. (11.10)—(11.13), 


uan = mio + =n = uio + ——" 5 01 — mo) 
2\1 1|0 1 10 1 1|02; 
| Vi er Sioto? l 
X10 2 X10 2 2 
Sas ol iS ere SS 
aji wo ( Sin z) o; 7 3% +o, 


Therefore, letting X1jọ increase to infinity, we have u21 = yı and Xa) = o? + p: 
This is equivalent to treating yı as fixed and assuming 41 ~ N (y1, 02). In the lit- 
erature, this approach to initializing the Kalman filter is called diffuse initialization 
because a very large X1joọ means one is uncertain about the initial condition. 

Next, turn to the effect of diffuse initialization on state smoothing. It is obvious 
that based on the results of Kalman filtering, state smoothing is not affected by the 
diffuse initialization for t = T,...,2. Thus, we focus on jz; given Fr. From Eq. 
(11.20) and the definition of Lı = 1 — Kı = Vie a2, 


yr = Hijo + X100 


110 "9 Xio +02 i Xio +02 : 


X10 


= (v + 0241). 
Zijo +02 p 


= Mijo + 


Letting X4)9 > œ, we have mir = Hijo + vi + oqi =Yi +o2q1. Furthermore, 
from Eq. (11.23) and using Vj = X49 + a. we have 


Lice Sige Ee : 1 ijo gr 
r= So “io Saro N Doto) | 
e € 
X10 X10 i 2 
= Xo l- a Lg = 1- > XioMı 
10 + oé 110 + OZ 


2 
X10 X10 
= (24) e2- (2) ot, 
ota Lo + OZ 
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Thus, letting X4)9 > oo, we obtain Xir = o? — oi M,. 

Based on the prior discussion, we suggest using diffuse initialization when little 
is known about the initial value j4;. However, it might be hard to justify the use 
of a random variable with infinite variance in real applications. If necessary, one 
can treat jz; as an additional parameter of the state-space model and estimate it 
jointly with other parameters. This latter approach is closely related to the exact 
maximum-likelihood estimation of Chapters 2 and 8. 


11.1.7 Estimation 


In this section, we consider the estimation of oe and o, of the local trend model 
in Eqs. (11.1) and (11.2). Based on properties of forecast errors discussed in 
Section 11.1.3, the Kalman filter provides an efficient way to evaluate the like- 
lihood function of the data for estimation. Specifically, the likelihood function 
under normality is 


T 
POI, tees yr |oe, On) = pyle, On) | Jou, Oe, On) 
t=2 
T 
= p(yiloe, 07) | [@lF-1, Ge, 0), 
t=2 


where yı ~ N (uijo, Vi) and v; = (Yr — Mar—1) ~ N(O, V;). Consequently, assum- 
ing Mijo and Xj\9 are known, and taking the logarithms, we have 


T 2 
In[L (oe, 0,)] = -2 n27) — 3 [nv + | , (11.25) 


t=1 


which involves v; and V;. Therefore, the log-likelihood function, including cases 
with missing values, can be evaluated recursively via the Kalman filter. Many soft- 
ware packages perform state-space model estimation via a Kalman filter algorithm 
such as Matlab, RATS, and S-Plus. In this chapter, we use the SsfPack program 
developed by Koopman, Shephard, and Doornik (1999) and available in S-Plus and 
OX. Both Ssfpack and OX are free and can be downloaded from their websites. 


11.1.8 S-Plus Commands Used 


We provide here the SsfPack commands used to perform analysis of the daily 
realized volatility of Aloca stock returns. Only brief explanations are given. For 
further details of the commands used, see Durbin and Koopman (2001, Section 
6.6). S-Plus uses specific notation to specify a state-space model; see Table 11.1. 
The notation must be followed closely. In Table 11.2, we give some commands 
and their functions. 
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TABLE 11.1 State-Space Form and Notation in S-Plus 


State-Space Parameter S-Plus Name 
ô mDelta 

® mPhi 

Q mOmega 

x mSigma 


TABLE 11.2 Some Commands of SsfPack Package 


Command Function 

SsfFit Maximum-likelihood estimation 
CheckSsf Create “Ssf” object in S-Plus 

KalmanFil Perform Kalman filtering 

KalmanSmo Perform state smoothing 

SsfMomentEst with task “STFIL” Compute filtered state and variance 
SsfMomentEst with task “STSMO” Compute smoothed state and variance 
SsfCondDens with task “STSMO” Compute smoothed state without variance 


In our analysis, we first perform maximum-likelihood estimation of the state- 


space model in Eqs. (11.1) and (11.2) to obtain estimates of oe and o,. The initial 
values used are X1jọ = —1 and uijo = 0, where —1 signifies diffuse initialization, 
that is, X1jọ is very large. We then treat the fitted model as given to perform Kalman 
filtering and state smoothing. 


SsfPack and S-Plus Commands for State-Space Model 


oe + + ++ ++ 4+ V VV VV V 


Y NV 


da = read.table(file='aa-rv-0304.txt',header=F) % load data 
y = log(da[,1]) % log(RV) 

ltm.start=c(3,1) % Initial parameter values 

P1 = -1 % Initialization of Kalman filter 

al ==- 0 

ltm.m=function (parm) { % Specify a function for the 
sigma.eta=parm[1] % local trend model. 
sigma.e=parm[2] 

ssf.m=list (mPhi=as.matrix(c(1,1)), 
mOmega=diag(c(sigma.eta*2,sigma.e*2)), 
mSigma=as.matrix(c(Pl,al))) 

CheckSsf (ssf.m) 

} 

perform estimation 

ltm.mle=SsfFit(ltm.start,y,"ltm.m", lower=c(0,0), 
upper=c (100,100) ) 

ltm.mleSparameters 
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1] 0.07350827 0.48026284 
> sigma.eta=ltm.mleSparameters[1] 
> sigma.eta 
1] 0.07350827 
> sigma.e=ltm.mleSparameters [2] 
> sigma.e 
1] 0.4802628 
Specify a state-space model in S-Plus. 
ssf.ltm.list=list(mPhi=as.matrix(c(1,1)), 
mOmega=diag(c(sigma.eta*2,sigma.e*2)), 
mSigma=as.matrix(c(P1,al))) 
check validity of the specified model. 
ssf.ltm=CheckSsf(ssf.1ltm.list) 
> ssf.ltm 
SmPhi : 
[,1] 
dy I: 
2 1 
SmOmega: 


oe + + V Æ 


Vv 


[,1] [,2] 
0.0054035 0.0000000 
2,] 0.0000000 0.2306524 
SmSigma: 

[,1] 
dy -1 


H 


ScSt: 
a), al 
attr(, "class"): 
1] "ssf" 
% Apply Kalman filter 
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> KalmanFil.ltm=KalmanFil(y,ssf.1ltm, task="STFIL") 
> names (KalmanFil.1tm) 


[1] "mout" "innov" "std.innov" "mGain" "loglike" 
[6] "loglike.conc" "dVar" "mEst" "mOffP" "task" 
[11] "err" "cali" 

> par(mfcol=c(2,1) ) % Obtain plot 

> plot (KalmanFil.1ltm$ mEst[,1],xlab=’day’, 

+ ylab='filtered state’,type='1’) 

> title(main=’(a) Filtered state variable’ ) 

> plot (KalmanFil.ltm$ mOut[,1],xlab=’day’, 

+ ylab='v(t)’,type='1’) 

> title(main=’(b) Prediction error’) 

% Obtain residuals and their variances 


KalmanSmo.1tm=KalmanSmo (KalmanFil.ltm, ssf.1tm) 
names (KalmanSmo.1tm) 


mon N OV 


1] "state.residuals" "response.residuals" "state.variance" 
4] "response.variance" "aux.residuals" "scores" 

Zi “call 
% Filtered states 


> FiledEst.1ltm=SsfMomentEst(y,ssf.ltm,task="STFIL") 
> names (FiledEst.1tm) 
[1] "state.moment" "state.variance" "response.moment" 
[4] "response.variance" "task" 

% Smoothed states 
> SmoedEst.ltm=SsfMomentEst (y,ssf.ltm,task="STSMO") 
> names (SmoedEst.1tm) 
[1] "state.moment" "State.variance" "response.moment" 
[4] "response.variance" "task" 

Obtain plots of filtered and smoothed states with 95% C.I. 
up=FiledEst.1ltm$ state.moment+ 

2*sqrt(FiledEst.ltmS state.variance) 

lw=FiledEst.1ltm$ state.moment- 

2*sqrt(FiledEst.1ltm$ state.variance) 

par (mfcol=c(1,1) ) 

plot (FiledEst.ltm$ state.moment,type='1',xlab=’day’, 
ylab='value’,ylim=c(-0.1,2.5)) 

lines (1:340,up,1lty=2) 

lines (1:340,1w, lty=2) 

title(main=’Filed state variable’) 

up=SmoedEst.1ltm$ state.moment+ 

2*sqrt(SmoedEst.ltm$ state.variance) 

lw=SmoedEst.1ltm$ state.moment- 

2*sqrt (SmoedEst.ltm$ state.variance) 

plot (SmoedEst.1ltm$ state.moment,type='1'’,xlab='’day’, 
ylab='value’,ylim=c(-0.1,2.5)) 

lines (1:340,up,1lty=2) 

lines (1:340,1w, lty=2) 

title(main=’Smoothed state variable’) 

Model checking via standardized residuals 


oe 


oe VV Vv +tV +tV+V VV Vt VY Vt V + OV 
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> resi=KalmanFil.ltm$ mOut[,1]*sqrt(KalmanFil.1ltm$ mOut[,3]) 
archTest (resi) 
> autocorTest (resi) 


Vv 


For the daily realized volatility of Alcoa stock returns, the fitted local trend 
model is adequate based on residual analysis. Specifically, given the parameter 
estimates, we use the Kalman filter to obtain the 1-step-ahead forecast error v; and 
its variance V,. We then compute the standardized forecast error ù, = v;/ «/ V; and 
check the serial correlations and ARCH effects of {0}. We found that Q (25) = 
23.37 (0.56) for the standardized forecast errors, and the LM test statistic for ARCH 
effect is 18.48(0.82) for 25 lags, where the number in parentheses denotes p 
value. 


11.2 LINEAR STATE-SPACE MODELS 


We now consider the general state-space model. Many dynamic time series models 
in economics and finance can be represented in state-space form. Examples include 
the ARIMA models, dynamic linear models with unobserved components, time- 
varying regression models, and stochastic volatility models. A general Gaussian 
linear state-space model assumes the form 


S1 = di + Tisi + Rin, (11.26) 
Yi = Cr + Zisi + er, (11.27) 
where s; = (Sir, - . -, Smr)’ is an m-dimensional state vector, y, = (Yir, ---, Yer)’ isa 


k-dimensional observation vector, d; and c; are m- and k-dimensional deterministic 
vectors, T, and Z; are m x m and k x m coefficient matrices, R; is an m x n matrix 
often consisting of a subset of columns of the m x m identity matrix, and {ņ,} and 
{e,} are n- and k-dimensional Gaussian white noise series such that 


N, G N (0, Q), ee~ N (0, H,), 


where Q, and H, are positive-definite matrices. We assume that {e,;} and {y,} are 
independent, but this condition can be relaxed if necessary. The initial state sı is 
N (kijo; X1)0), where Hijo and %1\9 are given, and is independent of e; and ņ, for 
t>0. 

Equation (11.27) is the measurement or observation equation that relates the 
vector of observations y, to the state vector s;, the explanatory variable c,, and 
the measurement error e;. Equation (11.26) is the state or transition equation that 
describes a first-order Markov Chain to govern the state transition with innovation 
n,. The matrices T;, R;, Q,, Z;, and H, are known and referred to as system 
matrices. These matrices are often sparse, and they can be functions of some 
parameters 0, which can be estimated by the maximum-likelihood method. 

The state-space model in Eqs. (11.26) and (11.27) can be rewritten in a compact 
form as 
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ka = ô, + Ọs, + l, (11.28) 


t 


2 d; _ T, = Rin, 
eloh ssia al 


and {u+} is a sequence of Gaussian white nosies with mean zero and covariance 
matrix 


where 


anon [GE A] 


The case of diffuse initialization is achieved by using 
Zijo = La + AZo, 


where X., and Xoo are m x m symmetric positive-definite matrices and À is a large 
real number, which can approach infinity. In S-Plus and SsfPack, the notation 


2 
y= | ie] 
Hijo (m+1)xm 


is used; see the notation in Table 11.1. 
In many applications, the system matrices are time invariant. However, these 
matrices can be time varying, making the state-space model flexible. 


11.3 MODEL TRANSFORMATION 


To appreciate the flexibility of the state-space model, we rewrite some well-known 
econometric and financial models in state-space form. 


11.3.1 CAPM with Time-Varying Coefficients 


First, consider the capital asset pricing model (CAPM) with time-varying intercept 
and slope. The model is 


ri = Qr + Birt + er, er ~ N(O, o2), (11.29) 
Ui = 0r Fie e~ NO, op), 
Bi+1 = Bt + €r €r ~ NO, o2), 


where r; is the excess return of an asset, rm. is the excess return of the market, 
and the innovations {e;, n+, €+} are mutually independent. This CAPM allows for 
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time-varying œ and £ that evolve as a random walk over time. We can easily 


rewrite the model as 
Ory | 1 0 Olt Nt 
ele iadt el 


r= [L rm] Hi +e. 


Thus, the time-varying CAPM is a special case of the state-space model with s; = 
(a;, By)’, Ti = R; = In, the 2 x 2 identity matrix, d; = 0, ci = 0, Z; = (1, rm), 
H,= a, and Q, = diag{o,, 02}. Furthermore, in the form of Eq. (11.28), we 
have 6, = 0, u, = (m, €r, €V, 


1 0 o 0 0 
=|] 0 1 |, =| 0 o 0 
1 rm: 0 0 o? 


If diffuse initialization is used, then 


SsfPack/S-Plus Specification of Time-Varying Models 
For the CAPM in Eq. (11.29), ®, contains rm., which is time varying. Some 
special input is required to specify such a model in SsfPack. Basically, it requires 
two additional variables: (a) a data matrix X that stores Z, and (b) an index matrix 
for ®, that identifies Z, from the data matrix. The notation for index matrices of 
the state-space model in Eq. (11.28) is given in Table 11.3. Note that the matrix 
Jo must have the same dimension as ®;. The elements of Jẹ are all set to —1 
except the elements for which the corresponding elements of ®; are time varying. 
The nonnegative index value of J indicates the column of the data matrix X, 
which contains the time-varying values. 

To illustrate, consider the monthly simple excess returns of General Motors 
stock from January 1990 to December 2003 used in Chapter 9. The monthly simple 


TABLE 11.3 Notation and Name Used in SsfPack/S-Plus for Time-Varying 
State-Space Model 


Index Matrix Name Used in SsfPack/S-Plus 
Js mJDelta 

Jo mJPhi 

Jo mJOmega 

Time-Varying Data Matrix Name Used in SsfPack/S-Plus 


X mX 
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excess return of the S&P 500 composite index is used as the market return. The 
specification of a time-varying CAPM requires values of the variances On: 2 and 
og. Suppose that (o,, Oe, Ce) = (0.02, 0.04, 0.1). The state-space specification for 
the CAPM under SsfPack/S-Plus is given below: 

> X.mtx=cbind(1,sp) % Here ‘‘sp’’ is market excess returns. 
> Phi.t = rbind(diag(2),rep(0,2)) 

> Sigma=-Phi.t 

> sigma.eta=.02 

> sigma.ep=.04 

> sigma.e=.1 

> Omega=diag(c(sigma.eta*2,sigma.ep*2,sigma.e%*2) ) 

> JPhi = matrix(-1,3,2) % Create a 3-by-2 matrix of -1. 

> JPhi[3,1]=1 

> JPhi[3,2]=2 

> ssfi.tv.capm=list (mPhi=Phi.t, 

+ mOmega=Omega, 

+ mJPhi=JPhi, 

+ mSigma=Sigma, 

+ mX=X.mtx) 

> ssfi.tv.capm 

SmPhi: 


O - Oo = 


Sy 
SmOmega: 

[iL] Le) Lys 
1,] 4e-04 0.0000 0.00 
2,] Oe+00 0.0016 0.00 
3,] Oe+00 0.0000 0.01 


SmJPhi: 

Cell. [2] 
T -1 -1 
2; -1 -1 
3y al 2 
smSigma: 

btl -E21 
T; -1 0 
23 0 -1 
3, 0 0 
$mX: 

numeric matrix: 168 rows, 2 columns. 


sp 
[171 1 .=-0.075187 


[168;] 1- 0.05002 
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11.3.2 ARMA Models 
Consider a zero-mean ARMA(p, q) process y, of Chapter 2: 


o(B)y, =O(B)a, a, ~ N(0, oŻ), (11.30) 


where $(B) = 1 — )7?_, 6B! and 6(B) = 1 — = 6;B/, and p and q are non- 
negative integers. There are many ways to transform such an ARMA model into 
a state-space form. We discuss three methods available in the literature. Let m = 


max(p, gq + 1) and rewrite the ARMA model in Eq. (11.30) as 


m m-1 


Y= X diy tar — Yo jay; (11.31) 
i=l j=l 


where ¢; = 0 fori > p and 6; = 0 for j >q. In particular, 6,, = 0 because m > q. 


Akaike’s Approach 

Akaike (1975) defines the state vector s; as the minimum collection of variables 
that contains all the information needed to produce forecasts at the forecast origin 
t. It turns out that, for the ARMA process in Eq. (11.30) with m = max(p,q + 
1), St = (ves Yil -- -> Ye+m—1je)’, Where yije = E(yi+;|F;) is the conditional 
expectation of y+; given F; = {y1,..., Yr}. Since yr = yr, the first element of s, 
is y;. Thus, the observation equation is 


yr = ZS, (11.32) 
where Z = (1,0,...,0)ix. We derive the transition equation in several steps. 
First, from the definition, 

Store) = Vea = Veg dye + Ore — Yrtit) = Sor + Get, (11.33) 


where s;; is the ith element of s;. Next, consider the MA representation of ARMA 
models given in Chapter 2. That is, 


[0.0] 
Y: = a + piai + Yaa + = Y Viani, 
i=0 


where yo = 1 and other y weights can be obtained by equating coefficients of B! 
in 1 + Yel WB = 0(B)/ġ (B). In particular, we have 


yi =o - 0, 
p = pii + b2 — b2, 


MODEL TRANSFORMATION 581 


Wm-1 = Pı Ym-2 +F P2Wm-3 ap eae Pm-2 Y1 F m-1 = On—1 
m—1 


= J Qi Wm-1-i = Om—1- (11.34) 
i=l 


Using the MA representation, we have, for j > 0, 


Yije = EQ lf) = E (È Vion 
i=0 
= Yjar + jyt + Wj+2d-2 +--+ 
and 
Yije = E Orj Frs) = Wiig + War + Wygia—-i +: 
= Wy-14r4 + sie. 


Thus, for j > 0, we have 


Vet jietl = Verse + Wj-14r41- (11.35) 


This result is referred to as the forecast updating formula of ARMA models. It 
provides a simple way to update the forecast from origin ¢ to origin t+ 1 when 
y,;41 becomes available. The new information of y,,; is contained in the innovation 
a;41, and the time-t forecast is revised based on this new information with weight 
wj—1 to compute the time-(¢ + 1) forecast. 

Finally, from Eq. (11.31) and using E(a;4;|Fi41) = 0 for j > 1, we have 


m 


Yt+m|t+1 = XO bi Yetm—ilet — On—14t+41- 


i=l 
Taking Eq. (11.35), the prior equation becomes 
m—1 


Yermiesy = Y Qi Verm—ije + Ym—i—14e41) + OmYit — Om- 


j=l 


m m—1 
= a Pi Yt+m-ilt T > Qi Wm—1-i = Om-1 dt+1 
i=1 i=l 


= D Pi Yi+m-ilt T Wm-1đt+1, (11.36) 
i=l 
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where the last equality uses Eq. (11.34). Combining Eqs. (11.33) and (11.35) for 
j =2,...,m— l, and (11.36) together, we have 


Yi+1 0 1 0 0 Yt 
Yr+2\t+1 0 0 1 0 Yt+1lt 
Ye-+m—|t+1 0 0 O, we | Yi-+-m—2It 
Yt+m|t+1 Pm Pm-1 Pm-2 AA Qı Yt+m-—1|t 
1 
yi 
+ : t41- (11.37) 
Wm-2 
Wn-1 
Thus, the transition equation of Akaike’s approach is 
S1 = Ts; + Ri, n: ~ N(0, o2), (11.38) 


where n; = a41, and T and R are the coefficient matrices in Eq. (11.37). 


Harvey’s Approach 

Harvey (1993, Section 4.4) provides a state-space form with an m-dimensional 
state vector s;, the first element of which is y,, that is, 51; = y+. The other elements 
of s, are obtained recursively. From the ARMA(m, m — 1) model, we have 


m m—1 
Y1 = Pye + XO biyi = D Ojaryi-j + ari 
i=2 j=l 
= b18ir + Su + Nr, 
where sy = Doi @iYt+1-i — ie 0j4:41—j, Mr = 441, and as defined earlier 


Si, = yt. Focusing on s2441, we have 


m—1 


m 
sari = > biye2-i — È Ojaya- 
j=l 


i=2 


m m—1 


= doy + >) PiY42-1 — D> Ojar42-j — Oras 


i=3 j=2 


= G51; + 83 + (—O1) m 
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where $3; = $; 3 Gi ¥t42-i — YS 0; a;42—;. Next, considering 53,1, we have 


m—1 


m 
5341 = XO biyasi = `. Ojat+3-j 
j=2 


i=3 
m—1 


= oy + > Piya- — > Ojat+3-j + (—O2)ar41 


i=4 j=3 


= 3811 + Sat + (—O2) 71, 


where s4 = 7" 4 biyi43-1 — 0023 Ojr43_;. Repeating the procedure, we have 
m m= s 
Smt = Xan Pi Yt+m-1—i ae È jm- Ojat4+m-1-j = mn Yt-1 = Om—14t- Finally, 


Sm,t+1 = Pmt — Om—-14t+1 


= PmS1t + (—Om—1) Nt 


Putting the prior equations together, we have a state-space form 


S1 = Ts; + Rn, m ~N(0,0{), (11.39) 
yy = ZS, (11.40) 
where the system matrices are time invariant defined as Z = (1,0,..., O0)ixm, 
On 1 0 0 1 
do 0 1 0 9, 
T=| : >|, R= , 

Pm-1 0 0 1 o. 

gm 0 0 0 ns 


and d,, c,, and H, are all zero. The model in Eqs. (11.39) and (11.40) has no 
measurement errors. It has an advantage that the AR and MA coefficients are 
directly used in the system matrices. 


Aoki’s Approach 

Aoki (1987, Chapter 4) discusses several ways to convert an ARMA model into a 
state-space form. First, consider the MA model, that is, y, = 0 (B)a;. In this case, 
we can simply define s; = (a;—q, Q;~q+42, - - - , 4-1)’ and obtain the state-space form 
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At—q+1 0 1 - 0 at—q 0 
Me 001 Oe 1) a 0 
=| 3 l h e la, a 
ü 00 0 1 a2 
at 0 0 0 0 atı 1 
y= (0gs —Og—ts +++) —01)5S1 Far. 


Note that, in this particular case, a, appears in both state and measurement 
equations. 
Next, consider the AR model, that is, 6(B)z; = ar. Aoki (1987) introduces two 


methods. The first method is a straightfoward one by defining s; = (Zt-p+1; <--> zy 
to obtain 
Zt—p4+2 0 1 0 0 Zt—p+1 0 
Zt—p43 0 0 1 0 Zt—-p+2 0 
= J=] : = JA a 
Zt 0 0 0 1 Zt+1 0 
Zt+1 dp Pp-1 bp-2 © Qi Zt 1 
(11.42) 


a= (0, 0,---, 0, I)s;. 


The second method defines the state vector in the same way as the first method 
except that a; is removed from the last element; that is, s; = z: — a; if p = 1 and 


St = (Zt-p+1» ees Zt—1> Zt — a,) if p > 1. Simple algebra shows that 
Zt—p+2 0 1 0 eae (0) Zt—p+1 
Zt- p+3 0 0 1 0 Zt—p4+2 
Zt 0 0 0 1 Zt-1 

Zt+1 — ar41 p Pp-1 p2 `: Qı Zt — ar 
0 
0 

+ : a, (11.43) 

1 
Qı 


zt = (0,0,...,0, Ds, + ar. 


Again, a; appears in both transition and measurement equations. 
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Turn to the ARMA(p, q) model ¢(B)y; = 0(B)a;. For simplicity, we assume 
q < p and introduce an auxiliary variable z; = [1/@(B)]a;. Then, we have 


P(B)z1 = ar, yt = 0 (B)z:. 


Since z; is an AR(p) model, we can use the transition equation in Eq. (11.42) or 
(11.43). If Eq. (11.42) is used, we can use y; = 0 (B)z+ to construct the measurement 
equation as 


y = (—Op-1; =0 p2; tees 61, 1)s,, (11.44) 


where it is understood that p >q and 6; = 0 for j >q. On the other hand, if Eq. 
(11.43) is used as the transition equation, we construct the measurement equation 
as 


Yr = (—Op—1, —Op—2, - - -, —O1, D8, + ar. (11.45) 


In summary, there are many state-space representations for an ARMA model. 
Each representation has its pros and cons. For estimation and forecasting purposes, 
one can choose any one of those representations. On the other hand, for a time- 
invariant coefficient state-space model in Eqs. (11.26) and (11.27), one can use the 
Cayley—Hamilton theorem to show that the observation y; follows an ARMA(m, m) 
model, where m is the dimension of the state vector. 


SsfPack Command 

In SsfPack/S-Plus, a command GetSsfArma can be used to transform an ARMA 
model into a state-space form. Harvey’s approach is used. To illustrate, consider 
the AR(1) model 


ye = 0.6y,_1 + a;, a, ~ N(0, 0.47). 
The state-space form of the model is 


> ssf.arl = GetSsfArma(ar=0.6,sigma=0.4) 
> ssf.arl 


SmPhi: 

[,1] 
T 0.6 
2, T0 
$mOmega: 

stl- b2] 
1,] 0.16 0 
2,] 0.00 0 
SsmSigma : 

Ded] 
1,] 0.25 
2,] 0.00 
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Since the AR(1) model is stationary, the program uses %1\9 = Var(y;) = (0.4)? / 
(1 — 0.67) = 0.25 and Hijo = 0. These values appear in the matrix mSigma. 
As a second example, consider the ARMA(2,1) model 


ye = 1.2y;—1 = 0.35 y,-2 + a, — 0.25a;_1, a™ N(O, 1,17}, 


The state-space form of the model is 


> arma21.m = list(ar=c(1.2,-0.35) ,ma=c(-0.25),sigma=1.1) 
> ssf.arma21= GetSsfArma (model=arma21.m) 
> ssf.arma21 


SmPhi: 
[,1] [,2] 
Lp 1.20 1 
2,] -0.35 0 
3, 1.00 0 
SmOmega 
Lea] [,2] [,3] 
Ly 1.2100 -0.302500 0 
2,] -0.3025 0.075625 0 
3, 0.0000 0.000000 0 
smSigma 
fy 4] [,2] 
1, 4.060709 -1.4874057 
2,] -1.487406 0.5730618 
3; 0.000000 0.0000000 


As expected, the output shows that 


12 1 
De] iss mE oa: 


and mPhi and mOmega follow the format of Eq. (11.28), and the covariance matrix 
of (Sir, S2) is used in mSigma, where sır = yp and sz = —0.35y,_1 — 0.25 y;,_2. 
Note that in SsfPack, the MA polynomial of an ARMA model assumes the form 
0(B) =1+6,B+...+6,B%, not the form 6(B) = 1 — 01B — ... — 0; B? com- 
monly used in the literature. 


11.3.3 Linear Regression Model 


Multiple linear regression models can also be represented in state-space form. 
Consider the model 


y =x Bre, e,~ N0, 02), 
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where x; is a p-dimensional explanatory variable and £ is a p-dimensional param- 
eter vector. Let s; = B for all t. Then the model can be written as 


e g ee) |; (11.46) 
Yt x; er 
Thus, the system matrices are T, = Ip, Z; = x}, d; = 0, c =0, Q, =0, and 


H; = oè. Since the state vector is fixed, a diffuse initialization should be used. 
One can extend the regression model so that B, is random, say, 


Brat = P, + Rim, m ~ N(O, 1), 
and R; = (01, ..., Op) with o; > 0. If o; = 0, then A; is time invariant. 


SsfPack Command 

In SsfPack, the command GetSsfReg creates a state-space form for the multiple 
linear regression model. The command has an input argument that contains the data 
matrix of explanatory variables. To illustrate, consider the simple market model 


t; = Bo + Pirma + et, t=1,..., 168, 


where r, is the return of an asset and rm, is the market return, for example, the 
S&P 500 composite index return. The state-space form can be obtained as 

> ssf.reg=GetSsfReg(cbind(1,sp)) % ’sp’ is market return. 

> ssf.reg 


SmPhi: 

[edd E2] 
1, 1 0) 
Qs, 0 1 
3, 0 0) 
SmOmega: 

bil [e242 F31 
Ii, 0 0 0 
2, 0 0 0 
oe 0 0 1 
smSigma: 

LALI Le2d 
T; -1 0 
2, 0 -1 
3, 0 0 
SmJPhi: 

bell b2] 
T; -1 -1 
2, -1 -1 
Bry 1 2 
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numeric matrix: 168 rows, 2 columns. 
sp 
[1,] 1 -0.075187 


[168,] 1 0.05002 


11.3.4 Linear Regression Models with ARMA Errors 
Consider the regression model with ARMA(p, q) errors: 


Yt =I Bb + zr, $(B)z; = 0(B)a;, (11.47) 


where a; ~ N (0, o2) and x; is a k-dimensional vector of explanatory variables. A 
special case of this model is the nonzero mean ARMA(p, q) model in which x, 
= | for all t and becomes a scalar parameter. Let s; be a state vector for the z, 
series, for example, that defined in Eq. (11.39). We can define a state vector s¥ for 
Yı as 


* St 
= , 11.48 
SA (1148) 
where B, = £ for all t. Then, a state-space form for y, is 
Sry) = T*s; + R'm, (11.49) 
yy = Zi st, (11.50) 


where Z? = (1,0, ...,0, x/)ix(m+k)) m = max(p, q + 1), and 


„ [T 0 „TR 
r-[b a} [5] 


where T and R are defined in Eq. (11.39). In a compact form, we have the state- 
space model 


SsfPack Command 

SsdPack uses the command GetSsfRegArma to construct a state-space form for 
linear regression models with ARMA errors. The arguments of the command can be 
found using the command args (GetSsfRegArma). They consist of a data matrix 
for the explanatory variables and ARMA model specification. To illustrate, consider 
the model 
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yr = fo t+ Bixi +2, t=1,---, 168, 
Zp = 1.2%-1 — 0.35z;-2 + a; — 0.25a;-1, a; ~ N(O, 02). 


We use the notation X to denote the T x 2 matrix of regressors (1, x;). A state-space 
form for the prior model can be obtained as 


> ssf.reg.arma21=GetSsfRegArma(X,ar=c(1.2,-0.35), 
+ ma=c(-0.25)) 
> ssf.reg.arma21 


SmPhi: 
[,1] [,2] (,3] [,4] 
i; T20 1 0 0 
2,] -0.35 0 0 0 
Sin 0.00 0 T: 0 
4, 0.00 0 0 1 
5; 1.00 0 0 0 
SmOmega: 
[iL] [,2] [,3] [,4] [,5] 
T; 1.00 -0.2500 0 0 0 
2,] -0.25 0.0625 0 0 0 
34 0.00 0.0000 0 0 0 
4, 0.00 0.0000 0 0 0 
De 0.00 0.0000 0 0 0 
smSigma 
Let] [,2] [,3] [,4] 
i; 3.35595 -1.229260 
2,] -1.22926 0.473604 0 0 
3y 0.00000 0.000000 -1 0 
4, 0.00000 0.000000 0 -1 
5y 0.00000 0.000000 0 0 
SmJPhi: 
[yl]. t21 3l L4] 
1, -1 -1 -1 -1 
a, -1 -1 -1 -1 
Sig -1 -1 -1 -1 
A, -1 -1 -1 -1 
5, -1 -1 1 2 
$mX: 
numeric matrix: 168 rows, 2 columns. 
xt 


[1,] 1 0.4993 


[168,] 1 0.7561 
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11.3.5 Scalar Unobserved Component Model 


The basic univariate unobserved component model, or the structural time series 
model (STSM), assumes the form 


Yt = Ht + Vi + Or + er, (11:51) 


where ur, yr, and @; represent the unobserved trend, seasonal, and cycle compo- 
nents, respectively, and e; is the unobserved irregular component. In the literature, 
a nonstationary (possibly double-unit-root) model is commonly used for the trend 
component: 


Mit) = Hi + Êi + m, ne ~ NO, op), (11.52) 
Bt = Br-1 + Sts Se~ N0, 02), 


where uı ~ N(O, £) and 6; ~ N(O, £) with £ a large real number, for example, € = 
108. See, for instance, Kitagawa and Gersch (1996). If o- = 0, then u; follows a 
random walk with drift £1. Ifo, =o, = 0, then jz; represents a linear deterministic 
trend. 

The seasonal component y; assumes the form 


(+ B+---+ By = o, wr ~ N(0, 02), (11.53) 


where s is the number of seasons in a year, that is, the period of the seasonal- 
ity. If o% = 0, then the seasonal pattern is deterministic. The cycle component is 
postulated as 


Ont | cos(A,)  sin(àe) D, E 
| Bis | = s| —sin(Ac) cos(A,) iil oF J-l ex iF (11.54) 


[4 ]~¥(L0 Jeon) 


wo ~ N(O, of), @ ~ N (0, oĉ), and Cov(©0, w5) = 0, 5 € (0, 1] is called a damp- 
ing factor, and the frequency of the cycle is A, = 2x /q with q being the period. 
If 6 = 1, then the cycle becomes a deterministic sine—cosine wave. 


where 


SsfPack/S-Plus Command 

The command GetSsfStsm constructs a state-space form for the structural time 
series model. It allows for 10 cycle components; see the output of the command 
args (GetSsfStsm). Table 11.4 provides a summary of the arguments and their 
corresponding symbols of the model. To illustrate, consider the local trend model 
in Eqs. (11.1) and (11.2) with oe = 0.4 and o, = 0.2. This is a special case of the 
scalar unobserved component model. One can obtain a state-space form as 
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TABLE 11.4 Arguments of Command GetSsfStsm in SsfPack/S-Plus 


Argument STSM parameter 
irregular Oe 

level On 

slope Oç 
seasonalDummy Ow, § 
seasonalTrig Ow, § 
SeasonalHS Ow, S 

Cycle0 Os, Àc, 6 

Cycle9 Oes Ac, 6 


> ssfi.stsm=GetSsfStsm(irregular=0.4,level=0.2) 
> ssfi.stsm 
SmPhi: 

[et] 
Ly 
2, 1 
SmOmega: 
Ltl [y2] 
1,] 0.04 0.00 
2,] 0.00 0.16 


SsmSigma : 
[+1] 

Ty -1 

2 0 


11.4 KALMAN FILTER AND SMOOTHING 


In this section, we study the Kalman filter and various smoothing methods for 
the general state-space model in Eqs. (11.26) and (11.27). The derivation follows 
closely the steps taken in Section 11.1. For readers interested in applications, this 
section can be skipped at the first read. A good reference for this section is Durbin 
and Koopman (2001, Chapter 4). 


11.4.1 Kalman Filter 


Recall that the aim of the Kalman filter is to obtain recursively the conditional 
distribution of s,;, given the data F, = {y,,..., y,} and the model. Since the 
conditional distribution involved is normal, it suffices to study the conditional mean 
and covariance matrix. Let sj); and X j); be the conditional mean and covariance 
matrix of s; given Fj, that is, s;|F; ~ N(sj;, X ju). From Eq. (11.26), 
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Stil = E(d,;+T;s;+ Rin, |F) = di + T (S41, (11.55) 
Èi = Var(T +s; + Rin, |r) = TUT, + R: Q,R'. (11.56) 


Similarly to that of Section 11.1, let y,),_; be the conditional mean of y, given 
F,-,. From Eq. (11.27), 


Vita = Cr + Z181\1-1- 
Let 
Vr = Jim Ver = Ye (ct + Z,St\t—-1) = Z;(8; — Stt—1) +e, (11.57) 
be the 1-step-ahead forecast error of y, given F;_,. It is easy to see that (a) 
E(v,|F;-1) = 0; (b) v, is independent of F;_;, that is, Cov(v;, y;) = 0 for 1 < 
j < t, and (c) {v;} is a sequence of independent normal random vectors. Also, let 
V, = Var(v;|F;_1) = Var(v,) be the covariance matrix of the 1-step-ahead forecast 
error. From Eq. (11.57), we have 
V; = Var[Z;(8+ — Syi-1) + e] = ZZ p1-1Z, + #H,. (11.58) 


Since F, = {F;—1, y;} = {Fi-1, vr}, we can apply Theorem 11.1 to obtain 


Ste = E (si| Fi) = E(s1|Fi-1, ve) 
= E(s;|F;-1) + Cov(s;, v:)[Var(v:)] v; 
= 8-1 +C: V7 'v, (11.59) 


where C, = Cov(s;, v;|F;—-1) given by 


C, = Cov(s;, v;|Fi-1) = Cov[s;, Z; (81 — Sir-1) + erl Fii] 
= Cov[s;, Z;(81 — Si1-1) |Fi-1] = E-Z. 


Here we assume that V, is invertible because H; is. Using Eqs. (11.55) and (11.59), 
we obtain 


Stil = d, + TiStt-1 + TiC: V7 ‘0; =d, + T (Sit—-1 + Kievs, (11.60) 
where 


K,=7;C,V;' =T;EyoiZ.Vs (11.61) 
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which is the Kalman gain at time t. Applying Theorem 11.1(2), we have 


Lie = Var(s;|Fi-1, vr) 
= Var(s;|F;—1) — Cov(s;, v)[Var(v:)] !Cov(s;, v) 
= Ea- — C, V7 C, 
= Ee- — Loe Zi V7 Z, Leet. (11.62) 


Plugging Eq. (11.62) into Eq. (11.56) and using Eq. (11.61), we obtain 
Eit = TEL, + R Q, Ri, (11.63) 
where 
L; =T,- K;+Z;. 


Putting the prior equations together, we obtain the celebrated Kalman filter for the 
state-space model in Eqs. (11.26) and (11.27). Given the starting values $1jọ and 
È 1jo, the Kalman filter algorithm is 

Vi = Y, — Cr — ZiStt-1, 

V,=Z:2m-1Z,+ Hi, 

K, = TSA, (11.64) 

L,=T,—-—K,Z,, 

Siyi = di + Ti S8tr-1 + Kr0;, 
Eii = Tr Lei L, + R O,R,, f= lrst 


If the filtered quantities s; and X; are also of interest, then we modify the filter 
to include the contemporaneous filtering equations in Eqs. (11.59) and (11.62). The 
resulting algorithm is 

Vp = Y; — Cy = LS 
Cy = Djs 
Vi = ZX- Z, + H, = ZC, + We, 
Stt = Sitr—1 + C, V7 ti, 
Lee = Lee-1 — CV; Ci, 
Seyi = di + TiSi, 
Eii = T:E T, + R, QO, Ri. 
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Steady State 

If the state-space model is time invariant, that is, all system matrices are time 
invariant, then the matrices %);-; converge to a constant matrix %,, which is a 
solution of the matrix equation 


x, = TX,T' —TX,ZV'ZE,T' + ROR’, 


where V = ZD,.Z' + H. The solution that is reached after convergence to £, is 
referred to as the steady-state solution of the Kalman filter. Once the steady state is 
reached, V,, K;, and %;4+4); are all constant. This can lead to considerable saving 
in computing time. 


11.4.2 State Estimation Error and Forecast Error 
Define the state prediction error as 
Xt = St — Stit-1- 


From the definition, the covariance matrix of x, is Var(x;|F;—1) = Var(s;|F;—1) = 
ÈŁ,-1. Following Section 11.1, we investigate properties of x,. First, from Eq. 
(11.57), 


Vr = Z;(s; — St\1—-1) +e; = ZX; 4+ én 
Second, from Eqs. (11.64) and (11.26), and the prior equation, we have 


Xt+1 = Sto — St+ijt 
= 7T, (8; — Sit—1) + Rin, — Kiv: 
= Tix; + Rin, — K,(Z;x; + &;) 
= Lix: + Rin, — Kre, 


where, as before, L; = T; — K;Z;. Consequently, we obtain a state-space form for 
Vv; as 


Vi = ZX; +e, Xin. = Lx; + Rin, — Kres, (11.65) 

with x; = sj — Sijo for t= 1,...,T. 
Finally, similar to the local-trend model in Section 11.1, we can show that the 
l-step-ahead forecast errors {v;} are independent of each other and {v;, ..., vr} is 


independent of F;_1. 
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11.4.3 State Smoothing 


State smoothing focuses on the conditional distribution of s, given Fr. Notice that 


(a) F;—ı and {v;,..., vr} are independent and (b) v; are serially independent. We 
can apply Theorem 11.1 to the joint distribution of s; and {v;,..., vr} given F;—1 
and obtain 


Sar = E(s,|Fr) = E(s;|Fi-1, Ur, .-., vr) 


T 


= E(s;|F;-1) + X Cov(s;, vj)[Var(w;)] v; 
j=t 
T 
= 8-1 + X Cov(s;, vj) V7 vj, (11.66) 
j=t 


where the covariance matrices are conditional on F;_;. The covariance matrices 
Cov(s;, vj) for j =t,..., T can be derived as follows. By Eq. (11.65), 


Cov(s;, vj) = E(s,v';) 
= E[s;(Zjxj + ej)'] = E(s:x4)Z;, i eed Oe (11.67) 
Furthermore, 


E(s;x;) = E[s;(s; — Srr-1)'] = Var(s;) = Èi, 
E(s:X}44) = Els;(L:x; + Rin, — K,e;)'] = Eai Li, 
E(8;Xj49) = Ere- Li L444; (11.68) 


E(s:x7) = Ler L; eae Ly 1° 


Plugging the prior two equations into Eq. (11.66), we have 


STIT =STIT-1 + Èrir-1Z7 V7 vr, 


$ —1 / 1 —1 
Spar = Sr-yr—2 + Lr-yr-2Zp_,Vp_yvr-1 + Er-ir-2Lr-iZrVr vr, 


SiT = Str-1 + Veje-1Z,V; Vs + Ere-1 Li Zia Vin v 
++ EiL Lge Lpa Zr Vr vr, 


for t=T —2,T —3,..., 1, where it is understood that L! --- L _; = Im when 
t = T. These smoothed state vectors can be expressed as 


SHT = Stt—1 + LHt-191-1> (11.69) 
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1 —1 1 —1 1 } —1 
where g7_, = ZrVr Ur, Q7_2 = Zr Vr_vr-1 + L7_,Z7V 7 vr, and 
1y7-1 Lgl —1 Ir! 1 1 —1 
q1 = Z, V; v + LZ Vaate teo L mit Lr-1ZrVr Ur, 


for t=T —2,T —3,...,1. The quantity g,_; is a weighted sum of the 1|-step- 
ahead forecast errors v; occurring after time t — 1. From the definition in the prior 
equation, g, can be computed recursively backward as 


Q;-1 = Zi V7 tv, + Lig, t=T,...,1, (11.70) 


with qr = 0. Putting the equations together, we have a backward recursion for the 
smoothed state vectors as 


Gt-1 = Z! V7 'v, + Ligq,, SiT = Stt-1 + Ett-14 i1: t= Tsk 
(11.71) 


starting with qr = 0, where s;;-1, Ersr—1, Lr, and V; are available from the Kalman 
filter. This algorithm is referred to as the fixed interval smoother in the literature; 
see de Jong (1989) and the references therein. 


Covariance Matrix of Smoothed State Vector 

Next, we derive the covariance matrices of the smoothed state vectors. Applying 
Theorem 11.1(4) to the conditional joint distribution of s; and {v;,..., vr} given 
F;—1, we have 


T 
Er = E-i — X Cov(s;, vj)[Varw j) [Cov(s;, vj). 


j=t 
Using the covariance matrices in Eqs. (11.67) and (11.68), we further obtain 


Ear = Enei — Ene- Zi V7 ZE — Eai Li Zp Vig Ze Li Eri 
ae Ega Li Lp Zr Vp Zr Ly LE 


= Lat 1— Lar iM; 1È ls 


where 
Mii = Z V7 Zi + L, Za Vi Zoi Li 
ETEN he -Lr 1Zr V7 ZrLr T 
Again, Li --- Lr- = Im when t = T. From its definition, the M;_ matrix satisfies 


M; = Z V7'Z,+ L'M,L,  t=T,...,1, (11.72) 
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with the starting value Mr = 0. Collecting the results, we obtain a backward 
recursion to compute Z,\7 as 


Mı = DVO DZ: +L; M,L;,, Lar = Lilt cS Lay iM, Lar l; (11.73) 


fort =T,...,1 with Mr = 0. Note that, like that of the local trend model in 
Section 11.1, M; = Var(q,). 
Combining the two backward recursions of smoothed state vectors, we have 
l;i = Z, Vv, + Lids 
SiT = Str-1 + Ltt—-191-1> (11.74) 
M,-1 = Z, V7! Z, + LIM; Ly, 
Ler = Egri — Vqr—-1Mi-1 Ber-1, ot =T,...,1, 


with qr = 0 and Mr = 0. 

Suppose that the state-space model in Eqs. (11.26) and (11.27) is known. Appli- 
cation of the Kalman filter and state smoothing can proceed in two steps. First, the 
Kalman filter in Eq. (11.64) is used for t = 1,..., T and the quantities v;,V;, K+, 
S;r—1, and Z,,_; are stored. Second, the state smoothing algorithm in Eq. (11.74) 
is applied fort = T, T —1,..., 1 to obtain syr and Eyr. 


11.4.4 Disturbance Smoothing 


Let enr = E(e,|Fr) and 14r = E(n,|Fr) be the smoothed disturbances of the 
observation and transition equation, respectively. These smoothed disturbances are 
useful in many applications, for example, in model checking. In this section, we 
study recursive algorithms to compute smoothed disturbances and their covariance 
matrices. Again, applying Theorem 11.1 to the conditional joint distribution of e; 
and {v;,..., vr} given F;_;, we obtain 


T 
enr = E(e:|Fi—1, %,---,¥r) = $, Elev) V7 vj, (11.75) 
j=t 
where E(e,|F;—1) = 0 is used. Using Eq. (11.65), 
E(e,v',) = E(e,x;)Z'; + E(ee;). 
Since E(e;x/) = 0, we have 


H,, if j =f, 


E(e,v'.) = 
(e105) Ex )Z), for j=t+1,.. T. 


(11.76) 
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Using Eq. (11.65) repeatedly and the independence between {e;} and {y,}, we 
obtain 


E(e;x}4,) = —HK,, 
Elex) = -HK Lipi; 


(11.77) 
E(e;x7) = —H, K; Li;i aiai Lr_\, 


where it is understood that Li, ,---L7>_; = Im ift = T — 1. Based on Eqs. (11.76) 
and (11.77), 


eyr = HV ey — K Zig Vahva — 0 K La hp 2 pV 5! vr) 
= H,(V;1v, — K'‘q,) 
= Ho, t=T,---,1, (11.78) 


where q, is defined in Eq. (11.69) and 0, = Vrlo, — K'q,. We refer to o, as the 
smoothing measurement error. 
The smoothed disturbance y,;7 can be derived analogously, and we have 


T 
mir = >, Em v) V7 vj. (11.79) 


j=t 
The state-space form in Eq. (11.69) gives 


QO,R,Z,.,, if j=tt+, 


E(n,v',) = 
n, j) E(N,x';)Z',, if j=t+2,---,T, 


where 


E(N, X442) = Q, R, Lipi 
E, x43) = Q, RL. Li, 


E(n,x7) = QR, Ligi e Lra 
for t = 1,..., T. Consequently, Eq. (11.79) implies 
Nr = Q, Ri (Zi Vi F LZ 2V a202 
+t Ligi Lp ZrVz vr) 
= O,R'4q,. b= 7322251, (11.80) 
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where q, is defined earlier in Eq. (11.70). 
Koopman (1993) uses the smoothed disturbance 7),;7 to derive a new recursion 
for computing s:r. From the transition equation in Eq. (11.26), 


Sir = 4, +7 Syr + Rinyr- 
Using Eq. (11.80), we have 
Sir = d; + Tisyr + R: Q, Riq, (a) E Ae (11.81) 


where the initial value is sır = S1jo + X1ļ0qọ with qo obtained from the recursion 
in Eq. (11.70). 


Covariance Matrices of Smoothed Disturbances 
The covariance matrix of the smoothed disturbance can also be obtained using 
Theorem 11.1. Specifically, 
Var(e;|Fr) = Var(e;,|F;—1, vt, ..., vr) 
T 
= Var(e;|F;-1) — > Cov(e;, vj)V;'[Cov(er, vl. 


j=t 
Note that Cov(e;, vj) = E(e,v;), which is given in Eq. (11.76). Thus, we have 


Var(e;|Fr) = H, — H,(V7' + KZ), Vi Zea Ke 


+ KL Zia V o Z2Lr K; 
+ + K, Liya -L'p_ıZr Vr ZrLr- + La K)H, 
= H, — H,(V;'+ K'M,K,)H; 


= H, = H,N,H,, 


where N, = y7! + K'M,K,, where M, is given in Eq. (11.72). Similarly, 


T 
Var(;|Fr) = Var(m,) — X Cové, vi) V7 Cov, v) ', 


j=t 


where Cov(y,, vj) = E mvi), which is given before when we derived the formula 
for Nr. Consequently, 


Var(y,|Fr) = Q, — Q, R, (Z, Vi Za + Lii Za V otal 


+e ot ee Lp ZpVp Zrkr-1- +: L141) RQ, 
= Q, = Q,R,M,R, Q,. 
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In summary, the disturbance smoothing algorithm is as follows: 


enr = H,(V;,'v; — K'q;), 
Nir = Q, Rq, 
qii = Zi V7 v, + Liq, (11.82) 
Var(e;|Fr) = H, — H;(V;' + K,M,K)H,, 
Var(n,|Fr) = Q, — Q,R,M;R:Q,, 
M,- = Z, V7'Z, + LiM,Ly, t=T,...,1, 


where qr = 0 and Mr = 0. 


11.5 MISSING VALUES 


For the general state-space model in Eqs. (11.26) and (11.27), we consider two 
cases of missing values. First, suppose that similar to the local trend model in 
Section 11.1 the observations y, att =+ 1,...,£+ h are missing. In this case, 
there is no new information available at these time points and we set 


v, = 0, K,=0, for t=€4+1,...,€+h. 
The Kalman filter in Eq. (11.64) can then proceed as usual. That is, 
Sr4 lt = d, + TiStt-1, Eiir = TEn T, + R; Q,R', 


fort =£+1,...,€+h. Similarly, the smoothed state vectors can be computed 
as usual via Eq. (11.74) with 


G4 =T 4p M,- = T)MGiT;, 


fort =€4+1,...,€+A. 

In the second case, some components of y, are missing. Let yf = J y, be the 
vector of observed data at time t, where J is an indicator matrix identifying the 
observed data. More specifically, rows of J are a subset of the rows of the k x k 
identity matrix. In this case, the observation equation (11.27) of the model can be 
transformed as 


x * * * 
Y: =c,+Z;s;+e,, 


where c¥ = Jc,, Zý = J Z,, and e* = Je, with covariance matrix Var(e*) = H% = 
JH,J'. The Kalman filter and state-smoothing recursion continue to apply except 
that the modified observation equation is used at time t. Consequently, the ease in 
handling missing values is a nice feature of the state-space model. 
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11.6 FORECASTING 


Suppose that the forecast origin is t and we are interested in predicting y,, ; 
for j = 1,..., h, where h >0. Also, we adopt the minimum mean-squared error 
forecasts. Similar to the ARMA models, the j-step-ahead forecast y,(j) turns out to 
be the expected value of y,, ; given F; and the model. That is, y,(j) = E(y,4 ;|F1). 
In what follows, we show that these forecasts and the covariance matrices of the 
associated forecast errors can be obtained via the Kalman filter in Eq. (11.64) by 
treating {y,,1,---, Yn} aS missing values, that is, the first case in Section 11.5. 
Consider the 1-step-ahead forecast. From Eq. (11.27), 


YiL) = E (yl Fe) = Crp + Zep 18410, 


where $;+1); is available via the Kalman filter at the forecast origin t. The associated 
forecast error is 


€:(1) = Yi — Y) = Zii (Sri — Sti) + erpi- 
Therefore, the covariance matrix of the 1-step-ahead forecast error is 
Var[e;(1)] = Zii Erti Zi 44 + Aisi. 

This is precisely the covariance matrix V;,, of the Kalman filter in Eq. (11.64). 
Thus, we have showed the case for h = 1. 

Now, for h > 1, we consider 1-step- to h-step-ahead forecasts sequentially. From 
Eq. (11.27), the j-step-ahead forecast is 

YG) = Crj + Zi jsi jito (11.83) 
and the associated forecast error is 
ei(j) = Zr4j (Stj — Set jie) + ert j- 


Recall that s;+ ;|; and &,+ j); are, respectively, the conditional mean and covariance 
matrix of s;4; given F;. The prior equation says that 


Var[e;(j)] = Zij Ur ju Zryj + y+}. (11.84) 
Furthermore, from Eq. (11.26), 
Sji = dij + Tr j Sit jie 
which in turn implies that 


Stj T Sep jtaje = Titi (St4 7 — Set ye) + Ritin 
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Consequently, 
Bjt = Tip jE iT 14; + Rj Qij Rigy (11.85) 


Note that Var[e;(j)] = V+; and Eqs. (11.83) and (11.85) are the recursion of 
the Kalman filter in Eq. (11.64) for t+ j with j = 1,...,h when v+; = 0 and 
K++; = 0. Thus, the forecast y,(j) and the covariance matrix of its forecast error 
e,(j) can be obtained via the Kalman filter with missing values. 

Finally, the prediction error series {v;} can be used to evaluate the likelihood 
function for estimation and the standardized prediction errors D7" 2o, can be used 
for model checking, where D, = diag{ V; (1, 1),..., Vi(k, k)} with V;(i, i) being 
the (i, i)th element of V,. 


11.7 APPLICATION 


In this section, we consider some applications of the state-space model in finance 
and business. Our objectives are to highlight the applicability of the model and to 
demonstrate the practical implementation of the analysis in S-Plus with SsfPack. 


Example 11.2. Consider the CAPM for the monthly simple excess returns of 
General Motors (GM) stock from January 1990 to December 2003; see Chapter 9. 
We use the simple excess returns of the S&P 500 composite index as the market 
returns. The returns are in percentages. Our illustration starts with a simple market 
model 


r, = a+ rma + et, e, ~ N(0, 02), (11.86) 


for t = 1,..., 168. This is a fixed-coefficient model and can easily be estimated 
by the ordinary least-squares (OLS) method. Denote the GM stock return and the 
market return by gm and sp, respectively. The result follows: 


> da=read.table(‘‘m-gmsp-excess-9003.txt’’,header=F) 
> gm=da[,1] 

> sp=dal[,2] 

> £it=OLS (gm~sp) 

> summary (fit) 


Call: 
OLS (formula = gm~sp) 
Coefficients: 
Value Std. Error t value Pr(>|t]) 
(Intercept) 0.1982 0.6302 0.3145. 017535 
sp 1.0457 0.1453 7.1962 0.0000 


Regression Diagnostics: 
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R-Squared 0.2378 
Adjusted R-Squared 0.2332 
Durbin-Watson Stat 2.0290 


Residual Diagnostics: 
Stat P-Value 
Jarque-Bera 2.5348 0.2816 
Ljung-Box 24.2132 0.3362 


Residual standard error: 8.13 on 166 degrees of freedom 
Thus, the fitted model is 
ri = 0.20 + 1.0457r mt + er, Ĝe = 8.13. 


Based on the residual diagnostics, the model appears to be adequate for the GM 
stock returns with adjusted R? = 23.3%. 

As shown in Section 11.3, model (11.86) is a special case of the state-space 
model. We then estimate the model using SsfPack. The result is as follows: 


> reg.m=function(parm,mX=NULL) { 

+ parm=exp (parm) % log(sigma.e) is used to ensure 
positiveness. 

+ ssf.reg=GetSsfReg (mX) 

+ ssf.reg$mOmega[3,3]=parm[1] 

+ CheckSsf (ssf.reg) 

+ } 

> c.start=c (10) 

> X.mtx=cbind(rep(1,168),sp) 

> reg. fit=SsfFit(c.start,gm, "reg.m",mX=X.mtx) 
RELATIVE FUNCTION CONVERGENCE 

> names (reg. fit) 


[1] "parameters" "objective" "message" "grad.norm" 
"iterations" 
[6] "f.evals" "g.evals" "hessian" "scale" "aux" 
[LLLI ead" “voov" 
> sqrt (exp(reg.fitSparameters) ) 
[1] 8.130114 
> ssf.reg$mOmega[3,3]=exp(reg.fitSparameters) 
> reg.s=SsfMomentEst (gm,ssf.reg,task="STSMO") 
> reg.s$state.moment[10, ] 
state.1 state.2 
0.1982025 1.045702 
> sqrt (reg.sS$state.variance[10,]) 
state.1 state.2 
0.6302091 0.1453139 


As expected, the result is in total agreement with that of the OLS method. 
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Finally, we entertain the time-varying CAPM of Section 11.3.1. The estimation 
result, including time plot of the smoothed response variable, is given below. The 
command SsfCondDens is used to compute the smoothed estimates of the state 
vector and observation without variance estimation. 


> tv.capm =function(parm,mX=NULL) { %setup model for estimation 
+ parm=exp (parm) Sparameterize in log for positiveness. 
+ Phi.t = rbind(diag(2),rep(0,2)) 

+ Omega=diag (parm) 

+ JPhi=matrix(-1,3,2) 

+ JPhi[3,1]=1 

+ JPhi[3,2]=2 

+ Sigma=-Phi.t 

+ ssfi.tv=list (mPhi=Phi.t, 

+ mOmega=Omega, 

+ mJPhi=JPhi, 

+ mSigma=Sigma, 

+ mX=mX) 

+ CheckSsf(ssf.tv) 

+ } 

> tv.start=c(0,0,0) % starting values 

> tv.mle=SsfFit(tv.start,gm, "tv.capm",mX=X.mtx) % estimation 
> sigma.mle=sqrt(exp(tv.mleSparameters) ) 

> sigma.mle 

[1] 4.907845e-05 1.219885e-02 8.125213e+00 

% Smoothing 

> smoEst.tv=SsfCondDens (gm,tv.capm(tv.mleSparameters,mX=X. 


+ task="STSMO") 
> names (smoEst.tv) 


[1] "state" "response" "task" 

> par(mfcol=c(2,2)) % plotting 

> plot(gm,type=’1’,ylab=’excess return’ ) 

> title(main='(a) Monthly simple excess returns’ ) 
> plot (smoEst.tv$response, type='’1’,ylab='rtn’ ) 

> title(main='(b) Expected returns’ ) 

> plot(smoEst.tv$state[,1],type='’1’,ylab=’value’ ) 
> title(main='(c) Alpha(t)’) 

> plot(smoEst.tvS$state[,2],type='’1’,ylab=’value’ ) 
> title(main='(d) Beta(t)’) 


Note that estimates of o, and os are 4.91 x 1075 and 1.22 x 107°, respectively. 
These estimates are close to zero, indicating that œ; and 6, of the time-varying 
market model are essentially constant for the GM stock returns. This is in agreement 
with the fact that the fixed-coefficient market model fits the data well. Figure 11.5 
shows some plots for the time-varying CAPM fit. Part (a) is the monthly simple 
excess returns of GM stock from January 1990 to December 2003. Part (b) is the 
expected returns of GM stock, that is, r;;7, where T = 168 is the sample size. Parts 
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Figure 11.5 Time plots of some statistics for time-varying CAPM applied to monthly simple excess 
returns of General Motors stock. S&P 500 composite index return is used as market return: (a) monthly 
simple excess return, (b) expected returns r,r, (c) œ; estimate, and (d) 8; estimate. 


(c) and (d) are the time plots of the estimates of œ; and 6;. Given the tightness in 
the vertical scale, these two time plots confirm the assertion that a fixed-coefficient 
market model is adequate for the monthly GM stock return. 

Example 11.3. In this example we reanalyze the series of quarterly earnings 


per share of Johnson & Johnson from 1960 to 1980 using the unobserved com- 
ponent model; see Chapter 2 for details of the data. The model considered is 


Y= Mity ten, — e~ NO, 02), (11.87) 


where y; is the logarithm of the observed earnings per share, u; is the local trend 
component satisfying 


Hii = Hit, m ~ NO, 0p), 
and y; is the seasonal component that satisfies 
(+B+B°+B)y=0,, o ~N(0,02), 


that is, y, = — a Vız; + œr. This model has three parameters—o,, o,, and 
Oo —and is a simple unobserved component model. It can be put in a state-space 
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form as 
Ht+1 1 Mt 1 0 
V+ _ |] 0 -1 = =I Yt ae 0 1 Nt 
Vt ~ 0 (0) 0 Vt-1 0 0 Wr d 
Yt—1 0 1 0 Yt—-2 0 0 


where the covariance matrix of (n+, œ)’ is diag{o,, o), and y; = [1, 1, 0, O]s; + e;; 
see Section 11.3. This is a special case of the structural time series in SsfPack and 
can easily be specified using the command GetSsfStsm. Performing the maximum- 
likelihood estimation, we obtain (ĉe, ôn, 6w) = (2.04 x 1076, 7.27 x 1077, 2.93 x 
1072). 


jnj=scan(file='’q-jnj.txt’) 
y=log(jnj) 

Estimation 

jnj .m=function (parm) { 
parm=exp (parm) 


Vv 


oe 


jnj .sea=GetSsfStsm(irregular=parm[1],level=parm[2], 
seasonalDummy=c (parm[3],4) ) 

CheckSsf (jnj.sea) 

} 


c.start=c(0,0,0) % Starting values 
jnj.est=SsfFit(c.start,y,"jnj.m") 
names (jnj.est) 
[1] "parameters" "objective" "message" "grad.norm" "itera- 
tions" 
[6] "f.evals" "g.evals" "hessian" "scale" "aux" 
11] "call" 
jnjest=exp(jnj.estSparameters) 
jnjest % estimates 
1] 2.044516e-06 7.269655e-02 2.931691e-02 
> jnj.ssf=GetSsfStsm(irregular=jnjest[1],level=jnjest[2], 
+ seasonalDummy=c(jnjest[3],4)) % specify the model with esti- 
mates 


VV VV + + + 4+ 4+ ~V 


> CheckSsf(jnj.ssf) 
SmPhi : 

[,1] [,2] [,3] [,4] 
L1] 1 0 0 0 
[27] 0 -1 -1 -1 
[34] 0 1 0 0 
[4,] 0 0 1 0 
[55.1 1 0 0) 


APPLICATION 


SsmOmega: 
E4] 
0.005284788 
,] 0.000000000 
0.000000000 
0.000000000 
5,] 0.000000000 


(= o oO cS © 


e UNBE 
oo oO 2. co = 


SmJ Phi: 

1] 0 
SmJOmega: 
1] 0 
SmJDelta: 
1] 0 
$mX: 


attr(, "class"): 
1] "ssf" % below 


[,2] 


-000000000 
-000859481 
- 000000000 
-000000000 
-000000000 


[,3 


oS Oe O oe 


rFoOO Oo 


: smoothed components 


jnj .smo=SsfMoment! 


vVt+V+V Vv 


[,5] 


-000000e+00 
-000000e+00 
-000000e+00 
-000000e+00 
-180047e-12 


Est(y,jnj.ssf,task="STSMO") 


upl=jnj.smo$state.moment[,1]+ 
2*sqrt(jnj.smo$state.variance[,1]) 
lwl=jnj.smo$state.moment[,1]- 
2*sqrt(jnj.smoS$state.variance[,1]) 
max(up1) % obtain the range for plotting 
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[1] 2.795702 

> min(lw1) 

[1] -0.5948943 

> up=jnj.smo$state.moment[,2]+ 

+ 2*sqrt(jnj.smo$state.variance[,2]) 

> lw=jnj.smo$state.moment[,2]- 

+ 2*sqrt(jnj.smo$state.variance[,2]) 

> max (up) 

[1] 0.3788652 

> min(lw) 

[1] -0.3552441 

> par(mfcol=c(2,1)) % plotting 

> plot (tdx,jnj.smoSstate.moment[,1],type=’1',xlab='year’, 

+ ylab='’value’,ylim=c(-1,3) ) 

> lines (tdx,up1,lty=2) 

> lines (tdx,lwl1,lty=2) 

> title(main='(a) Trend component’ ) 

> plot (tdx,jnj.smoSstate.moment[,2],type=’1',xlab='year’, 

+ ylab='’value’,ylim=c(-.5,.5)) 

> lines (tdx, up, lty=2) 

> lines (tdx, lw, 1lty=2) 

> title(main='(b) Seasonal component’ ) 

% Filtering and smoothing 

> jnj.fil=KalmanFil(y,jnj.ssf,task="STFIL") 

> jnj.smol=KalmanSmo(jnj.fil,jnj.ssf) 

> plot(tdx,jnj.filSmOut[,1],type='1'’,xlab=’year’,ylab='resi’ ) 
> title(main='(a) 1-Step forecast error’) 

> plot (tdx,jnj.smolSresponse.residuals[2:85],type='l’, 

+ xlab='’year’,ylab='resi’) 
> title(main='(b) Smoothing residual’) 


Figure 11.6 shows the smoothed estimates of the trend and seasonal compo- 
nents, that is, ur and y)7 with T = 84, of the data. Of particular interest is that 
the seasonal pattern seems to evolve over time. Also shown are 95% pointwise 
confidence regions of the unobserved components. Figure 11.7 shows the residual 
plots, where part (a) gives the 1-step-ahead forecast errors computed by Kalman 
filter and part (b) is the smoothed response residuals of the fitted model. Thus, 
state-space modeling provides an alternative approach for analyzing seasonal time 
series. It should be noted that the estimated components in Figure 11.6 are not 
unique. They depend on the model specified and constraints used. In fact, there 
are infinitely many ways to decompose an observed time series into unobserved 
components. For instance, one can use a different specification for the seasonal 
component, for example, sesonalTrig in SsfPack, to obtain another decomposi- 
tion for the earnings series of Johnson & Johnson. Thus, care must be exercised 
in interpreting the estimated components. However, for forecasting purposes, the 
choice of decomposition does not matter provided that the chosen one is a valid 
decomposition. 
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Figure 11.6 Smoothed components of fitting model (11.87) to logarithm of quarterly earnings per 
share of Johnson & Johnson from 1960 to 1980: (a) trend component and (b) seasonal component. 
Dotted lines indicate pointwise 95% confidence regions. 


EXERCISES 


11.1. 


11.3. 


Consider the ARMA(1,1) model  y, —0.8y;-; =a;+0.4a;-; with 
a, ~ N (0, 0.49). Convert the model into a state-space form using (a) 
Akaike’s method, (b) Harvey’s approach, and (c) Aoki’s approach. 


. The file aa-rv-20m. txt contains the realized daily volatility series of Alcoa 


stock returns from January 2, 2003, to May 7, 2004; see the example in 
Section 11.1. The volatility series is constructed using 20-minute intradaily 
log returns. 


(a) Fit an ARIMA(0,1,1) model to the log volatility series and write down 
the model. 


(b) Estimate the local trend model in Eqs. (11.1) and (11.2) for the log 
volatility series. What are the estimates of o, and o? Obtain time plots 
for the filtered and smoothed state variables with pointwise 95% confi- 
dence interval. 

Consider the monthly simple excess returns of Pfizer stock and the S&P 500 

composite index from January 1990 to December 2003. The excess returns 

are in m-pfesp-ex9003.txt with Pfizer stock returns in the first column. 
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Figure 11.7 Residual series of fitting model (11.87) to logarithm of quarterly earnings per share of 
Johnson & Johnson from 1960 to 1980: (a) 1-step-ahead forecast error v; and (b) smoothed residuals 
of response variable. 


11.4. 


(a) Fit a fixed-coefficient market model to the Pfizer stock return. Write 
down the fitted model. 


(b) Fit a time-varying CAPM to the Pfizer stock return. What are the esti- 
mated standard errors of the innovations to the œ, and £, series? Obtain 
time plots of the smoothed estimates of a; and fz. 


Consider the AR(3) model 
Xt = Pi Xt—-1 + P2X1~-2 + O3X1~3 + At, a, ~ N(O, o2), 
and suppose that the observed data are 
Yt = Xt + ĉr, e; ~ N(0, 02), 


where {e;} and {a;} are independent and the initial values of x; with j < 0 
are independent of e, and a, for t > 0. 


(a) Convert the model into a state-space form. 
(b) If E(e,;) = c, which is not zero, what is the corresponding state-space 
form for the system? 
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11.5. The file m-ppiaco4709.txt contains year, month, day, and U.S. producer 
price index (PPI) from January 1947 to November 2009. The index is for all 
commodities and not seasonally adjusted. Let z; = In(Z,) — In(Z;_;), where 
Z, is the observed monthly PPI. It turns out that an AR(3) model is adequate 
for z; if the minor seasonal dependence is ignored. Let y, be the sample 
mean-corrected series of zz. 


(a) Fit an AR(3) model to y, and write down the fitted model. 


(b) Suppose that y, has independent measurement errors so that y; = x; + er, 
where x; is a zero-mean AR(3) process and Var(e;) = a Use a state- 
space form to estimate parameters, including the innovational variances 
to the state and o. Write down the fitted model and obtain a time plot of 
the smoothed estimate of x+. Also, show the time plot of filtered response 
residuals of the fitted state-space model. 
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CHAPTER 12 


Markov Chain Monte Carlo Methods 
with Applications 


Advances in computing facilities and computational methods have dramatically 
increased our ability to solve complicated problems. The advances also extend the 
applicability of many existing econometric and statistical methods. Examples of 
such achievements in statistics include the Markov chain Monte Carlo (MCMC) 
method and data augmentation. These techniques enable us to make some statistical 
inference that was not feasible just a few years ago. In this chapter, we introduce 
the ideas of MCMC methods and data augmentation that are widely applicable 
in finance. In particular, we discuss Bayesian inference via Gibbs sampling and 
demonstrate various applications of MCMC methods. Rapid developments in the 
MCMC methodology make it impossible to cover all the new methods available in 
the literature. Interested readers are referred to some recent books on Bayesian and 
empirical Bayesian statistics (e.g., Carlin and Louis, 2000; Gelman, Carlin, Stern, 
and Rubin, 2003). 

For applications, we focus on issues related to financial econometrics. The 
demonstrations shown in this chapter represent only a small fraction of all possible 
applications of the techniques in finance. As a matter of fact, it is fair to say that 
Bayesian inference and the MCMC methods discussed here are applicable to most, 
if not all, of the studies in financial econometrics. 

We begin the chapter by reviewing the concept of a Markov process. Consider 
a stochastic process {X;}, where each X, assumes a value in the space ©. The 
process {X;} is a Markov process if it has the property that, given the value of X;, 
the values of X,, h >t, do not depend on the values Xs, s < t. In other words, 
{X+} is a Markov process if its conditional distribution function satisfies 


P(Xn|Xs,5 < t) = P(XalX2), h>t. 
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If {X;} is a discrete-time stochastic process, then the prior property becomes 


P(Xn|Xi, Xr-1, -..) = P(Xnl Xr), h>t. 


Let A be a subset of ©. The function 
P,(0,h, A) = P(Xn € A|X; = 9), h>t 


is called the transition probability function of the Markov process. If the transi- 
tion probability depends on h — t, but not on ¢, then the process has a stationary 
transition distribution. 


12.1 MARKOV CHAIN SIMULATION 


Consider an inference problem with parameter vector 0 and data X, where 0 € ©. 
To make inference, we need to know the distribution P (0|X). The idea of Markov 
chain simulation is to simulate a Markov process on ©, which converges to a 
stationary transition distribution that is P(@|X). 

The key to Markov chain simulation is to create a Markov process whose station- 
ary transition distribution is a specified P(@|X) and run the simulation sufficiently 
long so that the distribution of the current values of the process is close enough to 
the stationary transition distribution. It turns out that, for a given P(0|X), many 
Markov chains with the desired property can be constructed. We refer to methods 
that use Markov chain simulation to obtain the distribution P(@|X) as MCMC 
methods. 

The development of MCMC methods took place in various forms in the sta- 
tistical literature. Consider the problem of “missing value” in data analysis. Most 
statistical methods discussed in this book were developed under the assumption of 
“complete data” (i.e., there is no missing value). For example, in modeling daily 
volatility of an asset return, we assume that the return data are available for all 
trading days in the sample period. What should we do if there is a missing value? 

Dempster, Laird, and Rubin (1977) suggest an iterative method called the 
Expectation-Maximization (EM) algorithm to solve the problem. The method 
consists of two steps. First, if the missing value were available, then we could use 
methods of complete-data analysis to build a volatility model. Second, given the 
available data and the fitted model, we can derive the statistical distribution of the 
missing value. A simple way to fill in the missing value is to use the conditional 
expectation of the derived distribution of the missing value. In practice, one can 
start the method with an arbitrary value for the missing value and iterate the 
procedure for many many times until convergence. The first step of the prior 
procedure involves performing the maximum-likelihood estimation of a specified 
model and is called the M-step. The second step is to compute the conditional 
expectation of the missing value and is called the E-step. 

Tanner and Wong (1987) generalize the EM algorithm in two ways. First, they 
introduce the idea of iterative simulation. For instance, instead of using the con- 
ditional expectation, one can simply replace the missing value by a random draw 
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from its derived conditional distribution. Second, they extend the applicability of 
the EM algorithm by using the concept of data augmentation. By data augmenta- 
tion, we mean adding auxiliary variables to the problem under study. It turns out 
that many of the simulation methods can often be simplified or speeded up by data 
augmentation; see the application sections of this chapter. 


12.2 GIBBS SAMPLING 


Gibbs sampling (or Gibbs sampler) of Geman and Geman (1984) and Gelfand and 
Smith (1990) is perhaps the most popular MCMC method. We introduce the idea 
of Gibbs sampling by using a simple problem with three parameters. Here the word 
parameter is used in a very general sense. A missing data point can be regarded 
as a parameter under the MCMC framework. Similarly, an unobservable variable 
such as the “true” price of an asset can be regarded as N parameters when there 
are N transaction prices available. This concept of parameter is related to data 
augmentation and becomes apparent when we discuss applications of the MCMC 
methods. 

Denote the three parameters by 6), 02, and 63. Let X be the collection of available 
data and M the entertained model. The goal here is to estimate the parameters so 
that the fitted model can be used to make inference. Suppose that the likelihood 
function of the model is hard to obtain, but the three conditional distributions of 
a single parameter given the others are available. In other words, we assume that 
the following three conditional distributions are known: 


Fi (@1|02, 03, X, M), F2(02|03, 01, X, M), Jfs(03101, 02, X, M), (12.1) 


where fj (9;|0;4;, X, M) denotes the conditional distribution of the parameter 6; 
given the data, the model, and the other two parameters. In application, we do not 
need to know the exact forms of the conditional distributions. What is needed is the 
ability to draw a random number from each of the three conditional distributions. 

Let 62,9 and 63.9 be two arbitrary starting values of 62 and 63. The Gibbs sampler 
proceeds as follows: 


1. Draw a random sample from fi (61102,0, 63,0, X, M). Denote the random draw 
by O11. 

2. Draw a random sample from f>(62|93.0, 01,1, X, M). Denote the random draw 
by 67,1. 

3. Draw a random sample from /3(63|1,1, 62,1, X, M). Denote the random draw 
by 63,1. 


This completes a Gibbs iteration and the parameters become 01,1, 02,1, and 63,1. 
Next, using the new parameters as starting values and repeating the prior itera- 
tion of random draws, we complete another Gibbs iteration to obtain the updated 
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parameters 61,2, 02,2, and 63,2. We can repeat the previous iterations for m times to 
obtain a sequence of random draws: 


(91,1, 92,1, 3,1), =- - , (O1,m, 02,m, 93,m)- 


Under some regularity conditions, it can be shown that, for a sufficiently large 
m, (i,m, 92,m, 03,m) is approximately equivalent to a random draw from the joint 
distribution f (61, 62, 03|X, M) of the three parameters. The regularity conditions 
are weak; they essentially require that for an arbitrary starting value (01,0, 02,0, 03,0), 
the prior Gibbs iterations have a chance to visit the full parameter space. The actual 
convergence theorem involves using the Markov chain theory; see Tierney (1994). 

In practice, we use a sufficiently large n and discard the first m random draws 
of the Gibbs iterations to form a Gibbs sample, say, 


(A m+1 ry 02, m+41 ry 63, m+1)5 ey (Oln, Ozn, 03n). (12.2) 


Since the previous realizations form a random sample from the joint distribution 
Ff (1, 02, 03|X, M), they can be used to make inference. For example, a point 
estimate of 0; and its variance are 


1 


n— m 


n p: 1 n a 
sas. Gea — X Oj- 6)". (12.3) 
j=m+1 j=m+1 


a = 


The Gibbs sample in Eq. (12.2) can be used in many ways. For example, if 
we are interested in testing the null hypothesis Ho : 6; = 62 versus the alternative 
hypothesis Ha : 6; 4 42, then we can simply obtain the point estimate of 0 = 
6, — 62 and its variance as 


1 


n—-m 


n pe 1 n = 
E j-b)  @=— D Ojj- 8. 


n—-m—1 
j=m+1 j=m+1 


f= 


The null hypothesis can then be tested by using the conventional f-ratio statistic 
t=6/c. 


Remark. The first m random draws of a Gibbs sampling, which are discarded, 
are commonly referred to as the burn-in sample. The burn-ins are used to ensure 
that the Gibbs sample in Eq. (12.2) is indeed close enough to a random sample 
from the joint distribution f (01, 02, 03|X, M). 


Remark. The method discussed before consists of running a single long chain 
and keeping all random draws after the burn-ins to obtain a Gibbs sample. Alter- 
natively, one can run many relatively short chains using different starting values 
and a relatively small n. The random draw of the last Gibbs iteration in each chain 
is then used to form a Gibbs sample. 
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From the prior introduction, Gibbs sampling has the advantage of decomposing 
a high-dimensional estimation problem into several lower dimensional ones via 
full conditional distributions of the parameters. At the extreme, a high-dimensional 
problem with N parameters can be solved iteratively by using N univariate con- 
ditional distributions. This property makes the Gibbs sampling simple and widely 
applicable. However, it is often not efficient to reduce all the Gibbs draws into a 
univariate problem. When parameters are highly correlated, it pays to draw them 
jointly. Consider the three-parameter illustrative example. If 6; and 62 are highly 
correlated, then one should employ the conditional distributions f (01, 02/63, X, M) 
and f3(03|01, 02, X, M) whenever possible. A Gibbs iteration then consists of (a) 
drawing jointly (61, 62) given 63, and (b) drawing 63 given (6), 62). For more infor- 
mation on the impact of parameter correlations on the convergence rate of a Gibbs 
sampler, see Liu, Wong, and Kong (1994). 

In practice, convergence of a Gibbs sample is an important issue. The theory only 
states that the convergence occurs when the number of iterations m is sufficiently 
large. It provides no specific guidance for choosing m. Many methods have been 
devised in the literature for checking the convergence of a Gibbs sample. But there 
is no consensus on which method performs best. In fact, none of the available 
methods can guarantee 100% that the Gibbs sample under study has converged for 
all applications. Performance of a checking method often depends on the problem 
at hand. Care must be exercised in a real application to ensure that there is no 
obvious violation of the convergence requirement; see Carlin and Louis (2000) 
and Gelman et al. (2003) for convergence checking methods. In application, it is 
important to repeat the Gibbs sampling several times with different starting values 
to ensure that the algorithm has converged. 


12.3 BAYESIAN INFERENCE 


Conditional distributions play a key role in Gibbs sampling. In the statistical 
literature, these conditional distributions are referred to as conditional posterior 
distributions because they are distributions of parameters given the data, other 
parameter values, and the entertained model. In this section, we review some well- 
known posterior distributions that are useful in using MCMC methods. 


12.3.1 Posterior Distributions 


There are two approaches to statistical inference. The first approach is the classical 
approach based on the maximum-likelihood principle. Here a model is estimated by 
maximizing the likelihood function of the data, and the fitted model is used to make 
inference. The other approach is Bayesian inference that combines prior belief with 
data to obtain posterior distributions on which statistical inference is based. Histor- 
ically, there were heated debates between the two schools of statistical inference. 
Yet both approaches have proved to be useful and are now widely accepted. The 
methods discussed so far in this book belong to the classical approach. However, 
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Bayesian solutions exist for all of the problems considered. This is particularly so in 
recent years with the advances in MCMC methods, which greatly improve the fea- 
sibility of Bayesian analysis. Readers can revisit the previous chapters and derive 
MCMC solutions for the problems considered. In most cases, the Bayesian solu- 
tions are similar to what we had before. In some cases, the Bayesian solutions might 
be advantageous. For example, consider the calculation of value at risk in Chapter 
7. A Bayesian solution can easily take into consideration the parameter uncertainty 
in VaR calculation. However, the approach requires intensive computation. 

Let 0 be the vector of unknown parameters of an entertained model and X 
be the data. Bayesian analysis seeks to combine knowledge about the parameters 
with the data to make inference. Knowledge of the parameters is expressed by 
specifying a prior distribution for the parameters, which is denoted by P(@). For 
a given model, denote the likelihood function of the data by f(X|6). Then by the 
definition of conditional probability, 


fO, X) _ f(XAPO) 
f(X) F(X) 


where the marginal distribution f(X) can be obtained by 


fO|X) = (12.4) 


s% = J f(X, 0) 0 = J F (X18) PO) d0. 


The distribution f (0|X) in Eq. (12.4) is called the posterior distribution of 0. In 
general, we can use Bayes’s rule to obtain 


S(O|X) x f(X10)P (0), (12.5) 


where P (0) is the prior distribution and f(X|@) is the likelihood function. From 
Eq. (12.5), making statistical inference based on the likelihood function f(X|0@) 
amounts to using a Bayesian approach with a constant prior distribution. 


12.3.2 Conjugate Prior Distributions 


Obtaining the posterior distribution in Eq. (12.4) is not simple in general, but there 
are cases in which the prior and posterior distributions belong to the same family 
of distributions. Such a prior distribution is called a conjugate prior distribution. 
For MCMC methods, use of conjugate priors means that a closed-form solution 
for the conditional posterior distributions is available. Random draws of the Gibbs 
sampler can then be obtained by using the commonly available computer routines 
of probability distributions. In what follows, we review some well-known conjugate 
priors. For more information, readers are referred to textbooks on Bayesian statistics 
(e.g., DeGroot 1970, Chapter 9). 


Result 12.1. Suppose that x;,...,X, form a random sample from a normal 
distribution with mean u, which is unknown, and variance o”, which is known 
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and positive. Suppose that the prior distribution of jz is a normal distribution with 
mean jl, and variance oĉ. Then the posterior distribution of u given the data and 
prior is normal with mean jw, and variance A given by 


o? Uo + nox 
S olano and = 
oO” = no; 


where x = } ;_; x;/n is the sample mean. 

In Bayesian analysis, it is often convenient to use the precision parameter n = 
1/o? (i.e., the inverse of the variance o7). Denote the precision parameter of the 
prior distribution by no = 1/02 and that of the posterior distribution by n = 1/02. 
Then Result 12.1 can be rewritten as 

Me =No+nn and fy = x uot xi. 
* UE 
For the normal random sample considered, data information about jz is contained in 
the sample mean x, which is the sufficient statistic of u. The precision of x is n/o? 
= nn. Consequently, Result 12.1 says that (a) precision of the posterior distribution 
is the sum of the precisions of the prior and the data, and (b) the posterior mean is 
a weighted average of the prior mean and sample mean with weight proportional 
to the precision. The two formulas also show that the contribution of the prior 
distribution is diminishing as the sample size n increases. 


A multivariate version of Result 12.1 is particularly useful in MCMC methods 
when linear regression models are involved; see Box and Tiao (1973). 


Result 12.la. Suppose that x1,...,X, form a random sample from a multi- 
variate normal distribution with mean vector u and a known covariance matrix 
x. Suppose also that the prior distribution of w is multivariate normal with mean 
vector W, and covariance matrix Ło. Then the posterior distribution of m is also 
multivariate normal with mean vector mw, and covariance matrix Z,., where 


E'l =E; +n and p, = E,(Z7 1a, tnd‘), 


where x = } ;_; x;/n is the sample mean, which is distributed as a multivariate 
normal with mean mw and covariance matrix Z/n. Note that nX~! is the precision 
matrix of x and = is the precision matrix of the prior distribution. 


A random variable 7 has a gamma distribution with positive parameters œ and 
B if its probability density function is 


F(nla, 6) = ote te, n>0, 


where T (œ) is a gamma function. For this distribution, E(7) = a/B and Var(7) = 


at/B?. 
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Result 12.2. Suppose that x;,...,X, form a random sample from a normal 
distribution with a given mean jz and an unknown precision 7. If the prior distri- 
bution of 7 is a gamma distribution with positive parameters œ and £, then the 
posterior distribution of 7 is a gamma distribution with parameters œ + (n/2) and 


B+ Jri a py /2. 


A random variable @ has a beta distribution with positive parameters a and £ 
if its probability density function is 


= rœ +$) a—1 — g)\b-1 
Fla, B) = Try” (hay, 020 <1, 


The mean and variance of 0 are E(@) = a/(a@ + £) and Var(@) = a6 /[(a + BY (at 
+D]. 


Result 12.3. Suppose that x1, ..., Xn form a random sample from a Bernoulli 
distribution with parameter 0. If the prior distribution of 0 is a beta distribution 
with given positive parameters œ and £, then the posterior of 0 is a beta distribution 
with parameters œ + } `;_; x; and 8 +n — } ;_; Xi. 


Result 12.4. Suppose that x1, ..., Xn form a random sample from a Poisson 
distribution with parameter à. Suppose also that the prior distribution of à is a 
gamma distribution with given positive parameters œ and 6. Then the posterior 
distribution of A is a gamma distribution with parameters œ + )~"_, x; and B +n. 


Result 12.5. Suppose that x;,..., Xn form a random sample from an exponential 
distribution with parameter à. If the prior distribution of A is a gamma distribution 
with given positive parameters a and £, then the posterior distribution of À is a 
gamma distribution with parameters œ + n and B + )°_, xi. 

A random variable X has a negative binomial distribution with parameters m 
and à, where m >Q and 0 < à < 1, if X has a probability mass function 


— Jea- ifn=0,1,..., 
p(n|m, A) = 


0 otherwise. 


A simple example of negative binomial distribution in finance is how many MBA 
graduates a firm must interview before finding exactly m “right candidates” for its 
m openings, assuming that the applicants are independent and each applicant has 
a probability à of being a perfect fit. Denote the total number of interviews by Y. 
Then X = Y — m is distributed as a negative binomial with parameters m and i. 


Result 12.6. Suppose that x;,...,x, form a random sample from a negative 
binomial distribution with parameters m and à, where m is positive and fixed. If 
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the prior distribution of A is a beta distribution with positive parameters a and £, 
then the posterior distribution of À is a beta distribution with parameters a + mn 


and B +}; Xi. 


Next we consider the case of a normal distribution with an unknown mean jz 
and an unknown precision 7. The two-dimensional prior distribution is partitioned 
as P(u, n) = P(u\n)P(n). 


Result 12.7. Suppose that x1, ..., Xn form a random sample from a normal dis- 
tribution with an unknown mean yz and an unknown precision 7. Suppose also that 
the conditional distribution of u given 7 = no is a normal distribution with mean 
Ho and precision Tono and the marginal distribution of 7 is a gamma distribution 
with positive parameters a and £. Then the conditional posterior distribution of u 
given 7 = 7 is a normal distribution with mean u, and precision nx, 


Tolo + NX 
bs = ———— and m= (tH +N), 
To +n 


where x = } ;_; x;/n is the sample mean, and the marginal posterior distribution 
of 7 is a gamma distribution with parameters œ + (n/2) and 6,, where 


Ton (x ~~ Hoy 


= 1% AERE \e 
Be = B+ 5) Gi - H+ es 


i=l 


When the conditional variance of a random variable is of interest, an inverted 
chi-squared distribution (or inverse chi-squared) is often used. A random variable Y 
has an inverted chi-squared distribution with v degrees of freedom if 1/Y follows a 
chi-squared distribution with the same degrees of freedom. The probability density 
function of Y is 


2-2/2 
= —(v/2+1) „,—1/(2y) 
v) = ——_ : >0. 
fOlv) T(u/2)> e y 


For this distribution, we have E(Y) = 1/(v — 2) if v>2 and Var(Y) = 2/[(v — 
2)? (v — 4)] if v > 4. 


Result 12.8. Suppose that a1, ..., a form a random sample from a normal 
distribution with mean zero and variance o*. Suppose also that the prior dis- 
tribution of ø? is an inverted chi-squared distribution with v degrees of free- 
dom [i.e., (vA) ja* ~ oe where à > 0]. Then the posterior distribution of o is 
also an inverted chi-squared distribution with v + n degrees of freedom—that is, 


(VA + Erat) ~ Kuyni 
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12.4 ALTERNATIVE ALGORITHMS 


In many applications, there are no closed-form solutions for the conditional poste- 
rior distributions. But many clever alternative algorithms have been devised in the 
statistical literature to overcome this difficulty. In this section, we discuss some of 
these algorithms. 


12.4.1 Metropolis Algorithm 


This algorithm is applicable when the conditional posterior distribution is known 
except for a normalization constant; see Metropolis and Ulam (1949) and Metropo- 
lis et al. (1953). Suppose that we want to draw a random sample from the distribu- 
tion f(@|X), which contains a complicated normalization constant so that a direct 
draw is either too time-consuming or infeasible. But there exists an approximate 
distribution for which random draws are easily available. The Metropolis algorithm 
generates a sequence of random draws from the approximate distribution whose 
distributions converge to f (0|X). The algorithm proceeds as follows: 


1. Draw a random starting value 0o such that f(@9|X) > 0. 
2. Fort = 1,2,..., 

a. Draw a candidate sample 0, from a known distribution at iteration t given 
the previous draw 0,;_;. Denote the known distribution by J,(0;|6;_1), 
which is called a jumping distribution in Gelman et al. (2003). It is also 
referred to as a proposal distribution. The jumping distribution must be 
symmetric—that is, J;(0;|0;) = J;(0;|0;) for all 0;, 0 j, and t. 

b. Calculate the ratio 


„_ SOX 
FOIX 


c. Set 


9, = 0, with probability min(r, 1), 
‘| 0,1 otherwise. 


Under some regularity conditions, the sequence {0;} converges in distribution to 
f (0|X); see Gelman et al. (2003). 

Implementation of the algorithm requires the ability to calculate the ratio r for 
all 6, and 0;—1, to draw 0, from the jumping distribution, and to draw a random 
realization from a uniform distribution to determine the acceptance or rejection of 
0.. The normalization constant of f(@|X) is not needed because only a ratio is 
used. 

The acceptance and rejection rule of the algorithm can be stated as follows: 
(i) if the jump from 0;—; to 0, increases the conditional posterior density, then 
accept 6, as 0,; (ii) if the jump decreases the posterior density, then set 0; = 0, 
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with probability equal to the density ratio r, and set 0; = 0;—1 otherwise. Such a 
procedure seems reasonable. 

Examples of symmetric jumping distributions include the normal and Student- 
t distributions for the mean parameter. For a given covariance matrix, we have 
f(0;|0;) = f(0;|0;), where f(6|0,) denotes a multivariate normal density function 
with mean vector 6,. 


12.4.2 Metropolis—Hasting Algorithm 
Hasting (1970) generalizes the Metropolis algorithm in two ways. First, the jumping 
distribution does not have to be symmetric. Second, the jumping rule is modified to 


f (041X)/J (04101-1) FOX) J (01-110) 


T FOO Oae FOXO 


This modified algorithm is referred to as the Metropolis—Hasting algorithm. Tierney 
(1994) discusses methods to improve computational efficiency of the algorithm. 


12.4.3 Griddy Gibbs 


In financial applications, an entertained model may contain some nonlinear param- 
eters (e.g., the moving-average parameters in an ARMA model or the GARCH 
parameters in a volatility model). Since conditional posterior distributions of nonlin- 
ear parameters do not have a closed-form expression, implementing a Gibbs sampler 
in this situation may become complicated even with the Metropolis—Hasting algo- 
rithm. Tanner (1996) describes a simple procedure to obtain random draws in 
a Gibbs sampling when the conditional posterior distribution is univariate. The 
method is called the Griddy Gibbs sampler and is widely applicable. However, the 
method could be inefficient in a real application. 

Let 6; be a scalar parameter with conditional posterior distribution 
f(G:|X, 0_;), where @_; is the parameter vector after removing 6;. For instance, 
if 0 = (01, 62, 03)’, then 0_, = (62, 63)’. The Griddy Gibbs proceeds as follows: 


1. Select a grid of points from a properly selected interval of 6;, say, Oii < 
Oi2 < +++ < Oim. Evaluate the conditional posterior density function to obtain 
wj = f (6;j|X, 0i) for J = l; aoe Mar 

2. Use w1,..., Wm to obtain an approximation to the inverse cumulative distri- 
bution function (CDF) of f(6;|X, 0_;). 

3. Draw a uniform (0,1) random variate and transform the observation via the 
approximate inverse CDF to obtain a random draw for 6;. 


Some remarks on the Griddy Gibbs are in order. First, the normalization con- 
stant of the conditional posterior distribution f (0;|X,0—;) is not needed because 
the inverse CDF can be obtained from {w}; (= directly. Second, a simple approx- 
imation to the inverse CDF is a discrete distribution for Y= with probabil- 
ity p(0i;) = w;/ `; Wv. Third, in a real application, selection of the interval 
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[9:1, Qim] for the parameter 6; must be checked carefully. A simple checking proce- 
dure is to consider the histogram of the Gibbs draws of 6;. If the histogram indicates 
substantial probability around 6;; or im, then the interval must be expanded. How- 
ever, if the histogram shows a concentration of probability inside the interval 
[6:1, 9m], then the interval is too wide and can be shortened. If the interval is 
too wide, then the Griddy Gibbs becomes inefficient because most of w; would be 
zero. Finally, the Griddy Gibbs or Metropolis—Hasting algorithm can be used in a 
Gibbs sampling to obtain random draws of some parameters. 


12.5 LINEAR REGRESSION WITH TIME SERIES ERRORS 


We are ready to consider some specific applications of MCMC methods. Examples 
discussed in the next few sections are for illustrative purposes only. The goal here 
is to highlight the applicability and usefulness of the methods. Understanding these 
examples can help readers gain insights into applications of MCMC methods in 
finance. 

The first example is to estimate a regression model with serially correlated 
errors. This is a topic discussed in Chapter 2, where we use SCA to perform the 
estimation. A simple version of the model is 


Yt = Po + Birt +--+ + BkXkt + Zr, 


Zt = ỌZi-1 + 4t, 


where y; is the dependent variable, x;, are explanatory variables that may contain 
lagged values of y;, and z; follows a simple AR(1) model with {a;} being a sequence 
of independent and identically distributed normal random variables with mean zero 
and variance o”. Denote the parameters of the model by 0 = (B.o, ay, where 
B = (Bo, Bi, ---, Bx)’, and let x, = (1, x1;,..., Xg) be the vector of all regressors 
at time f, including a constant of unity. The model becomes 


y =X;,B +2, Zt = O2-1 + 4r, 22 ee (12.6) 


where n is the sample size. 

A natural way to implement Gibbs sampling in this case is to iterate between 
regression estimation and time series estimation. If the time series model is known, 
we can estimate the regression model easily by using the least-squares method. 
However, if the regression model is known, we can obtain the time series z; by 
using z: = y; — x| ß and use the series to estimate the AR(1) model. Therefore, we 
need the following conditional posterior distributions: 


f(BIY, X, $, 07), f(GIY, X, B, 0°), f(o7|¥, X, B, $), 


where Y = (y1,..., Yn) and X denotes the collection of all observations of 
explanatory variables. 
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We use conjugate prior distributions to obtain closed-form expressions for the 
conditional posterior distributions. The prior distributions are 


2 vA 2 
B oe N(B,, Xo), $ a N (go, A o2 ~ Xv: (12.7) 


where again ~ denotes distribution, and B,, Zo, A, v, Øo, and o? are known quan- 
tities. These quantities are referred to as hyperparameters in Bayesian inference. 
Their exact values depend on the problem at hand. Typically, we assume that B, 
= 0, ġo = 0, and X, is a diagonal matrix with large diagonal elements. The prior 
distributions in Eq. (12.7) are assumed to be independent of each other. Thus, we 
use independent priors based on the partition of the parameter vector 0. 

The conditional posterior distribution f (B|Y, X, ¢, 07) can be obtained by using 
Result 12.la of Section 12.3. Specifically, given ġ, we define 


Yo,t = Yt — PYr-1; Xo = Xt — OX1-1. 
Using Eq. (12.6), we have 
Yot = B'Xo1 + ar, t = 2 eat: (12.8) 


Under the assumption of {a;}, Eq. (12.8) is a multiple linear regression. Therefore, 
information of the data about the parameter vector ĝ is contained in its least-squares 


estimate 
n -1 n 
R 1 
= Y oika X XotYout , 
t=2 t=2 


which has a multivariate normal distribution 


=ł1 
n 
R 2 
B se N B, Oo (Sse. 
t=2 


Using Result 12.1a, the posterior distribution of B, given the data, ø, and o”, is 
multivariate normal. We write the result as 


(BIY, X, b, 0) ~ N(B,, Xx), (12.9) 


where the parameters are given by 


a oN oak: 
go = >= tiy Le eae b. = p, (Elan =2 t tR + => n) 


oO 
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Next, consider the conditional posterior distribution of @ given £, o”, and the 
data. Because f is given, we can calculate z; = y; — B’x, for all t and consider 
the AR(1) model 
Zt = PZr-1 + r, =lat 


The information of the likelihood function about ¢ is contained in the least-squares 


estimate 
n =] n 
oe 2 
=(dora} (doe), 
t=2 t= 


which is normally distributed with mean @ and variance o° (X; z?_,)~'. Based 
on Result 12.1, the posterior distribution of ¢ is also normal with mean ġ, and 
variance o, where 


üo 29 
aZ z 
o7? = ai ie dx = op (2p == Boa) . (12.10) 

Finally, turn to the posterior distribution of g? given f, ¢, and the data. Because 
B and @ are known, we can calculate 


t = Zt — ÈZ, Zt = yr — B'xi, FS 2 sagh: 


By Result 12.8, the posterior distribution of ø? is an inverted chi-squared 
distribution—that is, 


và + >, a? 
DAT Dirt L ~ Xi a1) (12.11) 
o 
where xé denotes a chi-squared distribution with k degrees of freedom. 
Using the three conditional posterior distributions in Eqs. (12.9)-(12.11), we 
can estimate Eq. (12.6) via Gibbs sampling as follows: 


1. Specify the hyperparameter values of the priors in Eq. (12.7). 

2. Specify arbitrary starting values for B, ¢, and o? (e.g., the ordinary least- 
squares estimate of B without time series errors). 

3. Use the multivariate normal distribution in Eq. (12.9) to draw a random 
realization for £. 

4. Use the univariate normal distribution in Eq. (12.10) to draw a random real- 
ization for @. 


5. Use the chi-squared distribution in Eq. (12.11) to draw a random realization 
for o°. 
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Repeat steps 3—5 for many iterations to obtain a Gibbs sample. The sample means 
are then used as point estimates of the parameters of model (12.6). 


Example 12.1. As an illustration, we revisit the example of U.S. weekly inter- 
est rates of Chapter 2. The data are the 1-year and 3-year Treasury constant maturity 
rates from January 5, 1962, to April 10, 2009, and are obtained from the Federal 
Reserve Bank of St. Louis. Because of unit-root nonstationarity, the dependent and 
independent variables are 


l. c3¢ = 13 — 73,r-1, Which is the weekly change in 3-year maturity rate, 
2. Cit = Fit —11,2-1, which is the weekly change in 1-year maturity rate, 


where the original interest rates r;, are measured in percentages. In Chapter 2, we 
employed a linear regression model with an MA(1) error for the data. Here we 
consider an AR(2) model for the error process. Using the traditional approach in 
R, we obtain the model 


c3¢ = 0.782c1; + Zr, zt = 0.183z;_1 — 0.036z;_2 + ar, (12.12) 


where G, = 0.068. Standard errors of the coefficient estimates of Eq. (12.12) 
are 0.0075, 0.0201, and 0.0201, respectively. Except for a marginally significant 
residual ACF at lags 4 and 6, the prior model seems adequate. 


Writing the model as 
c3 = Be + Zt, Zr = PiZr-1 + G2z%1-2 + ar, (12.13) 


where {a;} is an independent sequence of N (0, o?) random variables, we estimate 
the parameters by Gibbs sampling. The prior distributions used are 


B~ N(0,4), $~ NIO, diag(0.25,0.16)], (va)/o” = (10 x 0.05)/a7 ~ xjp. 


The initial parameter estimates are obtained by the ordinary least-squares method 
[i.e., by using a two-step procedure of fitting the linear regression model first, then 
fitting an AR(2) model to the regression residuals]. Since the sample size 2466 is 
large, the initial estimates are close to those given in Eq. (12.12). We iterated the 
Gibbs sampling for 2100 iterations but discard results of the first 100 iterations. 
Table 12.1 gives the posterior means and standard errors of the parameters. From 
the table, the posterior mean of o is approximately 0.069. Figure 12.1 shows the 
time plots of the 2000 Gibbs draws of the parameters. The plots show that the draws 
are stable. Figure 12.2 gives the histogram of the marginal posterior distribution of 
each parameter. 

We repeated the Gibbs sampling with different initial values but obtained similar 
results. The Gibbs sampling appears to have converged. From Table 12.1, the 
posterior means are close to the estimates of Eq. (12.12).This is expected as the 
sample size is large and the model is relatively simple. 
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TABLE 12.1 Posterior Means and Standard Errors of Model (12.13) 
Estimated by Gibbs Sampling with 2100 Iterations? 


Parameter B Qı Q2 o? 
Mean 0.793 0.184 —0.036 0.00479 
Standard error 0.008 0.019 0.021 0.00013 
“The results are based on the last 2000 iterations, and the prior distributions are given 
in the text. 
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Figure 12.1 Time plots of Gibbs draws for the model in Eq. (12.13) with 2100 iterations. Results are 
based on last 2000 draws. Prior distributions and starting parameter values are given in text. 


12.6 MISSING VALUES AND OUTLIERS 


In this section, we discuss MCMC methods for handling missing values and detect- 
ing additive outliers. Let {y,}’_, be an observed time series. A data point ya is an 
additive outlier if 


Xn + @ ift=h, 


o | Xi otherwise, (12.14) 


where w is the magnitude of the outlier and x; is an outlier-free time series. 
Examples of additive outliers include recording errors (e.g., typos and measurement 
errors). Outliers can seriously affect time series analysis because they may induce 
substantial biases in parameter estimation and lead to model misspecification. 


MISSING VALUES AND OUTLIERS 629 


z 
Q 
O = 
vt O 
Q mM 
fap) Q 
S a 
T T T T T j} T T T T T T 1 
0.77 0.78 0.79 0.80 0.81 0.82 0.10 0.06 0.02 0.0 0.02 
B $2 
8 
8 8 
bs o 
Q oO 
oO N 
D =] 
7 g 
N 


ji T T T T T if 1 f T T T 1 
0.120.140.160.18 0.20 0.22 0.24 0.26 0.0044 0.0046 0.0048 0.0050 0.0052 
Qı o4 


Figure 12.2 Histograms of Gibbs draws for model in Eq. (12.13) with 2100 iterations. Results are 
based on last 2000 draws. Prior distributions and starting parameter values are given in text. 


Consider a time series x; and a fixed time index h. We can learn a lot about xp by 
treating it as a missing value. If the model of x, were known, then we could derive 
the conditional distribution of x, given the other values of the series. By comparing 
the observed value y, with the derived distribution of x}, we can determine whether 
yn can be classified as an additive outlier. Specifically, if y} is a value that is likely 
to occur under the derived distribution, then y, is not an additive outlier. However, 
if the chance to observe yp is very small under the derived distribution, then yp 
can be classified as an additive outlier. Therefore, detection of additive outliers and 
treatment of missing values in time series analysis are based on the same idea. 

In the literature, missing values in a time series can be handled by using either 
the Kalman filter or MCMC methods; see Jones (1980), Chapter 11, and McCulloch 
and Tsay (1994a). Outlier detection has also been carefully investigated; see Chang, 
Tiao, and Chen (1988), Tsay (1988), Tsay, Pefia, and Pankratz (2000), and the 
references therein. The outliers are classified into four categories depending on the 
nature of their impacts on the time series. Here we focus on additive outliers. 


12.6.1 Missing Values 


For ease in presentation, consider an AR(p) time series 


Xt = 1 X1-1 +++ + hpXi-p + 4t, (12.15) 
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where {a,;} is a Gaussian white noise series with mean zero and variance o°. 
Suppose that the sampling period is from t = 1 to t = n, but the observation xp is 
missing, where 1 < h <n. Our goal is to estimate the model in the presence of a 
missing value. 

In this particular instance, the parameters are 0 = (¢’, Xn, a), where @ = 
(¢1,.--, p). Thus, we treat the missing value x, as an unknown parameter. If we 
assume that the prior distributions are 


2 vA 2 
O~ NG E) m~ NoD, A~, 


where the hyperparameters are known, then the conditional posterior distributions 
f(O|X, xn, 07) and f(o7|X, xn, Q) are exactly as those given in the previous 
section, where X denotes the observed data. The conditional posterior distribution 
f (xn|X, , 07) is univariate normal with mean u, and variance of. These two 
parameters can be obtained by using a linear regression model. Specifically, given 
the model and the data, x, is only related to {xp_p,...,Xn—1,Xh41,--+»Xhntp}- 
Keeping in mind that x, is an unknown parameter, we can write the relationship 
as follows: 


1. For t = h, the model says 
xh = PixXn-1 +: + PpXh—p + ap. 


Letting yan = di xXp-1 + +++ + bpXn—p and by = —an, the prior equation can 
be written as 


Yn = Xn + bn = boxXn + br, 


where ġo = 1. 
2. For t = h + 1, we have 


Xh+1 = OiXn + P2Xh-1 + ° + ỌpXn+1-p + ans. 


Letting Ya+1 = Xn41 — Ọ2Xh-1 — +++ — ỌpXh+1-p and br41 = ar41, the prior 
equation can be written as 


Yn+1 = Q1Xn + dnt. 
3. In general, for t = h + j with j = 1,..., p, we have 
Xh+j = PixXng jar H- + ojx + Oj4iXn-1 H + OpXntj-p + ans;- 


Let  Yn+j = Xn4jp—O1Xn4j—1— + — Pj-1 X41 —Pj41Xh-1—*  *— PpXn+ j—p 
and ba+; = an+j. The prior equation reduces to 


Yat j = PjXn + br+j- 
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Consequently, for an AR(p) model, the missing value x; is related to the model, 
and the data in p + 1 equations 


Yat j = OjXn + br+j, PHO cee Ds (12.16) 


where ġo = 1. Since a normal distribution is symmetric with respect to its mean, an 
and —a,, have the same distribution. Consequently, Eq. (12.16) is a special simple 
linear regression model with p + 1 data points. The least-squares estimate of xp 
and its variance are 
P , l 2 
A j=0 $jYh+j a o 
Xh = — p 7> Vara) = 5p 7 


P z P a. 
Lixo Pj j=0 0} 


For instance, when p = 1, we have x, = [¢;/(1 + $7) (xn-1 + Xn+i), which is 
referred to as the filtered value of x,. Because a Gaussian AR(1) model is time 
reversible, equal weights are applied to the two neighboring observations of xp to 
obtain the filtered value. 

Finally, using Result 12.1, we obtain that the posterior distribution of xp is 
normal with mean u, and variance o2, where 


° Mo + 03 (Dino Fi) 2 0°05 (12.17) 
l= — r a ü; = =n EO ; 
* o? +02 oo?) ETAD 


Missing values may occur in patches, resulting in the situation of multiple con- 
secutive missing values. These missing values can be handled in two ways. First, 
we can generalize the prior method directly to obtain a solution for multiple fil- 
tered values. Consider, for instance, the case that x, and xp+ı are missing. These 
missing values are related to {x;_p,...,Xn—13 Xh+2; - - -, Xh+p+1}- We can define a 
dependent variable y,+; in a similar manner as before to set up a multiple linear 
regression with parameters x, and xp+1. The least-squares method is then used to 
obtain estimates of x, and x;,41. Combining with the specified prior distributions, 
we have a bivariate normal posterior distribution for (xn, Xn+1)'. In Gibbs sampling, 
this approach draws the consecutive missing values jointly. Second, we can apply 
the result of a single missing value in Eq. (12.17) multiple times within a Gibbs 
iteration. Again consider the case of missing x, and x;4,;. We can employ the 
conditional posterior distributions f (x,|X, Xn+1, 9, o?) and FS ani lX, xn, @, o?) 
separately. In Gibbs sampling, this means that we draw the missing value one at a 
time. 

Because x; and x;+, are correlated in a time series, drawing them jointly is 
preferred in a Gibbs sampling. This is particularly so if the number of consecutive 
missing values is large. Drawing one missing value at a time works well if the 
number of missing values is small. 


Remark. In the previous discussion, we assumed h — p > 1 and h+ p <n. 
If h is close to the end points of the sample period, the number of data points 
available in the linear regression model must be adjusted. 
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12.6.2 Outlier Detection 


Detection of additive outliers in Eq. (12.14) becomes straightforward under the 
MCMC framework. Except for the case of a patch of additive outliers with similar 
magnitudes, the simple Gibbs sampler of McCulloch and Tsay (1994a) seems to 
work well; see Justel, Pefia, and Tsay (2001). Again we use an AR model to 
illustrate the problem. The method applies equally well to other time series models 
when the Metropolis—Hasting algorithm or the Griddy Gibbs is used to draw values 
of nonlinear parameters. 

Assume that the observed time series is yp, which may contain some additive 
outliers whose locations and magnitudes are unknown. We write the model for 
Yı as 


Yr = 5B + Xr, t=1,...,n, (12.18) 


where {ô;} is a sequence of independent Bernoulli random variables such that 
P(ô, = 1) = € and P(ô, = 0) = 1 — €, € is a constant between 0 and 1, {f;} is a 
sequence of independent random variables from a given distribution, and x; is an 
outlier-free AR(p) time series, 


Xt = bo + OX Hee + bpX1—p + a, 


where {a;} is a Gaussian white noise with mean zero and variance o”. This model 
seems complicated, but it allows additive outliers to occur at every time point. The 
chance of being an outlier for each observation is €. 

Under the model in Eq. (12.18), we have n data points, but there are 2n + p + 3 
parameters—namely, @ = (¢0,..., Op)’, 6 = (61, .--, bn)’, B = (Bi, ---, Bn)’, o’, 
and e. The binary parameters 5, are governed by e€ and the £, are determined by the 
specified distribution. The parameters ô and £ are introduced by using the idea of 
data augmentation with ô denoting the presence or absence of an additive outlier 
at time f, and f; is the magnitude of the outlier at time t when it is present. 

Assume that the prior distributions are 


vÀ 
D~ No Eo) N x2, e~ Beta(yı, y2), fr ~ N0, £, 


where the hyperparameters are known. These are conjugate prior distributions. To 
implement Gibbs sampling for model estimation with outlier detection, we need to 
consider the conditional posterior distributions of 


F@IY, ô, B, 0°), FOnY, 8-n, B, $, 0°), F(BalY, 8, B-n: $, 0°), 
f(elY, 8), f(o7l¥, ġ, 8, B), 


where | < h < n, Y denotes the data, and 0—; denotes that the ith element of 0 is 
removed. 
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Conditioned on ô and £, the outlier-free time series x, can be obtained by 
X; = yı — ôr fr. Information of the data about @ is then contained in the least-squares 
estimate 


-1 


n n 
T 1 
o= ` Xt-1%X;_1 ` Xt-1Xt | 


t=p+1 t=p+1 


where x;—; = (1, x;-1,..., X;—p)’, which is normally distributed with mean @ and 
covariance matrix 


-1 


The conditional posterior distribution of @ is therefore multivariate normal with 
mean @, and covariance matrix Z,, which are given in Eq. (12.9) with £ being 
replaced by @ and xo s by x;_;. Similarly, the conditional posterior distribution of 
a? is an inverted chi-squared distribution—that is, 


n 2 
vA + are a; Ly 2 
es oe Xv+(n—p)? 


J2 
where at = Xt — xı and Xi = MT ôt br. 
The conditional posterior distribution of ô, can be obtained as follows. First, ô, is 
only related to {y;, Bil he Ce with j Æ h, ġ, and o”. More specifically, 
we have 


Xj = yj — 5; Bj, JHh. 


Second, xp can assume two possible values: x, = yn — By if ôn = 1 and xh = Yh, 
otherwise. Define 


wj =x} — po — pix} — +++ pX} p j=hħh,... h+ p, 


where x7 = xj if j # h and x; = yn. The two possible values of xp give rise to 
two situations: 


e Case I: 6; = 0. Here the hth observation is not an outlier and x; = yh = Xn. 
Hence, w; = a; for j =h,...,h-+ p. In other words, we have 


w; ~ N(0, 0°), j=h,...,h+ p. 
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e Case II: ô, = 1. Now the hth observation is an outlier and x} = yn = Xn + Bn. 
The w; defined before is contaminated by By. In fact, we have 


Wn ~ N(Br,o7) and wj ~ N(—$j-nBn,o*?), f=ht,...,htp. 


If we define yo = —1 and w; = ¢; fori=1,..., p, then we have w; ~ 
N(—Wj-nBa, 07) for j =h,...,h+ p. 


Based on the prior discussion, we can summarize the situation as follows: 


1. Case I: ô, = 0 with probability 1 — e. In this case, w; ~ N (0, o?) for j = 


h,...,h+p. 
2. Case II: ô, = 1 with probability e. Here w; ~ N(—wWj-nBn, o?) for J= 
h,... h+ p. 


Since there are n data points, j cannot be greater than n. Let m = min(n, h + p). 
The posterior distribution of 5, is therefore 


1Y, 54, B, $, 0°) 
e expl- 77, (wy + Wj-nBn)?/Q20’)] 
e exp[— D'i p(w + Yj-nbr)?/Q0]+ A — ©) expl- Xi r w20] 
(12.19) 


P (ôn 


This posterior distribution is simply to compare the weighted values of the likeli- 
hood function under the two situations with weight being the probability of each 
situation. 

Finally, the posterior distribution of p is as follows. 


e If 5, = 0, then y, is not an outlier and 6, ~ N (0, &7), 

e If 5, = 1, then yp is contaminated by an outlier with magnitude 6p. The 
variable w; defined before contains information of a for j=h,h+1, 
...,min(h+ p,n). Specifically, we have w; ~ N(—wWj-nBn, o?) for 
jJ=h,h+1,...,min(h+ p,n). The information can be put in a linear 
regression framework as 


w; = —Wj—-nBn + 4;, jJ=h,h+1,...,min(h+ p,n). 
Consequently, the information is embedded in the least-squares estimate 


D j=n Vj-hWj 

Da J=h = : 

Ba = ar > m = min(h + p,n), 
j=h ¥ j=h 
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which is normally distributed with mean £, and variance o?/(X 5 n = 
By Result 12.1, the posterior distribution of 6, is normal with mean ý and 


variance Os where 


pi = =( Wj—-nw,)é? oe o7& 
CO ee T 


Example 12.2. Consider the weekly change series of U.S. 3-year Treasury 
constant maturity interest rate from March 18, 1988, to September 10, 1999, for 600 
observations. The interest rate is in percentage and is a subseries of the dependent 
variable c3 of Example 12.1. The time series is shown in Figure 12.3(a). If AR 
models are entertained for the series, the partial autocorrelation function suggests 
an AR(3) model and we obtain 


C3, = 0.22703 4-1 + 0.006c3 +2 + 0.114c3 4—2 + a, c? = 0.0128, 
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Figure 12.3 Time plots of weekly change series of U.S. 3-year Treasury constant maturity interest 
rate from March 18, 1988, to September 10, 1999: (a) data, (b) posterior probability of being an outlier, 
and (c) posterior mean of outlier size. Estimation is based on Gibbs sampling with 1050 iterations with 
first 50 iterations as burn-ins. 
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where standard errors of the coefficients are 0.041, 0.042, and 0.041, respectively. 
The Ljung—Box statistics of the residuals show Q(12) = 11.4, which is insignificant 
at the 5% level. 


Next, we apply the Gibbs sampling to estimate the AR(3) model and to detect 
simultaneously possible additive outliers. The prior distributions used are 


à 5x 0.00256 
$~N(0,0.2513), =~ ~ x2, n=5, n=95, =O, 
Oo 


o2 


where 0.00256 ~ G?/5 and £? ~ 967. The expected number of additive outliers is 
5%. Using initial values € = 0.05, o? = 0.012, od, = 0.2, d2 = 0.02, and ¢3 = 
0.1, we run the Gibbs sampling for 1050 iterations but discard results of the first 
50 iterations. Using posterior means of the coefficients as parameter estimates, we 
obtain the fitted model 


c31 = 0.252c3, 1—1 + 0.003c3,;-2 + 0.110c3 1-2 + ar, a? = 0.0118, 


where posterior standard deviations of the parameters are 0.046, 0.045, 0.046, and 
0.0008, respectively. Thus, the Gibbs sampling produces results similar to that 
of the maximum -likelihood method. Figure 12.3(b) shows the time plot of poste- 
rior probability of each observation being an additive outlier, and Figure 12.3(c) 
plots the posterior mean of outlier magnitude. From the probability plot, some 
observations have high probabilities of being an outlier. In particular, £ = 323 has 
a probability of 0.83 and the associated posterior mean of outlier magnitude is 
—0.304. This point corresponds to May 20, 1994, when the c3, changed from 0.24 
to —0.34 (i.e., about a 0.6% drop in the weekly interest rate within 2 weeks). The 
point with second highest posterior probability of being an outlier is t = 201, which 
is January 17, 1992. The outlying posterior probability is 0.58 and the estimated 
outlier size is 0.176. At this particular time point, c3 changed from —0.02 to 0.33, 
corresponding to a jump of about 0.35% in the weekly interest rate. 


Remark. Outlier detection via Gibbs sampling requires intensive computation 
but the approach performs a joint estimation of model parameters and outliers. Yet 
the traditional approach to outlier detection separates estimation from detection. It 
is much faster in computation, but may produce spurious detections when multiple 
outliers are present. For the data in Example 12.2, the SCA program also identifies 
t = 323 and t = 201 as the two most significant additive outliers. The estimated 
outlier sizes are —0.39 and 0.36, respectively. 


12.7 STOCHASTIC VOLATILITY MODELS 


An important financial application of MCMC methods is the estimation of stochas- 
tic volatility models; see Jacquier, Polson, and Rossi (1994) and the references 
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therein. We start with a univariate stochastic volatility model. The mean and volatil- 
ity equations of an asset return r, are 


rr = Po + Pixu +-+- + BpXpt + ar, a, = V hts, (12.20) 
In h; = go +a, In hy) + vy, (12.21) 
where {x;;|i = 1,..., p} are explanatory variables available at time ¢ — 1, the £; 


are parameters, {€,} is a Gaussian white noise sequence with mean 0 and variance 
1, {v;} is also a Gaussian white noise sequence with mean 0 and variance oè, and 
{e;} and {v;} are independent. The log transformation is used to ensure that h; is 
positive for all t. The explanatory variables x;; may include lagged values of the 
return (e.g., xj; = 7;~;). In Eq. (12.21), we assume that |a;| < 1 so that the log 
volatility process In h; is stationary. If necessary, a higher order AR(p) model can 
be used for In h,. 

Denote the coefficient vector of the mean equation by B = (fo, f1,.--., Bp)’ 
and the parameter vector of the volatility equation by @ = (ao, a1, 07)’. Suppose 
that R = (r1, ..., ray is the collection of observed returns and X is the collection 
of explanatory variables. Let H = (hy,...,h,)/ be the vector of unobservable 
volatilities. Here B and @ are the “traditional” parameters of the model and H 
is an auxiliary variable. Estimation of the model would be complicated via the 
maximum-likelihood method because the likelihood function is a mixture over the 
n-dimensional H distribution as 


F(RIX, Bo) = | F(RIX,B. P) fo) dH. 


However, under the Bayesian framework, the volatility vector H consists of aug- 
mented parameters. Conditioning on H, we can focus on the probability distribution 
functions f(R|H, B) and f(H|q@) and the prior distribution p(B, @). We assume 
that the prior distribution can be partitioned as p(B, œ) = p(B) p(@); that is, prior 
distributions for the mean and volatility equations are independent. A Gibbs sam- 
pling approach to estimating the stochastic volatility in Eqs. (12.20) and (12.21) 
then involves drawing random samples from the following conditional posterior 
distributions: 


f(B|R, X, H, w), f(H|R, X, B, w), f(@|R, X, B, H). 


In what follows, we give details of practical implementation of the Gibbs sampling 
used. 


12.7.1 Estimation of Univariate Models 


Given H, the mean equation in (12.20) is a nonhomogeneous linear regression. 
Dividing the equation by /h;, we can write the model as 


Tot =X, B+, t=1,...,n, (12.22) 
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where ro, = ri/v h; and xo, = x,/J/h;, with x; = (1, xy;,..., Xpt) being the vec- 
tor of explanatory variables. Suppose that the prior distribution of $ is multivariate 
normal with mean B, and covariance matrix A,. Then the posterior distribution of 
B is also multivariate normal with mean B, and covariance matrix A. These two 
quantities can be obtained as before via Result 12.la, and they are 


A=) touta tA BSA (È Xoo + azb) 
t=1 


t=1 


where it is understood that the summation starts with p + 1 if r,—p is the highest 
lagged return used in the explanatory variables. 

The volatility vector H is drawn element by element. The necessary conditional 
posterior distribution is f(h,|R, X, H_,, B,@), which is produced by the normal 
distribution of a, and the lognormal distribution of the volatility, 


f(hi|R, X, B, H, 0) 
x f (arhi, Fe Xr, B) f hilhi-1, ©) f (hiill, ©) 
o hy? expl—(r; — x16) /Chp)]h7' expl- (n h; — m)? /20°)] 
o hz! exp[—(r; — x1 B)?/(2h;) — An hy — u)? / 20], (12.23) 


where m: = [ao(1 — a) + a1 (In hi + In Ay-1)]/ +0?) and o? =07/(1+ 
at). Here we have used the following properties: (a) a;|h; ~ N(O, h+), (b) In h;| 
In hy) ~ N(ao + ay In hii, a2); (c) In hillo hy ~ N(@o + ay In hy, 02); (d) 
d ln h, = h;! dh,, where d denotes differentiation; and (e) the equality 


(x — a} A + (x — b} C = (x — 0)? (A+ C) + (a — bY AC/(A + ©), 


where c = (Aa + Cb)/(A + C) provided that A + C Æ 0. This equality is a scalar 
version of Lemma 1 of Box and Tiao (1973, p. 418). In our application, A = 1, a = 
ao + 1n hy1,C = ar, and b = (In /;41 — a)/ay. The term (a — b)*AC/(A + C) 
does not contain the random variable h, and, hence, is integrated out in the deriva- 
tion of the conditional posterior distribution. Jacquier, Polson, and Rossi (1994) use 
the Metropolis algorithm to draw h;. We use Griddy Gibbs in this section, and the 
range of h; is chosen to be a multiple of the unconditional sample variance of r+. 

To draw random samples of w, we partition the parameters as œ = (ao, a1)’ 
and oĉ. The prior distribution of œ is also partitioned accordingly [i.e., p(@) = 
pía) p(o?)]. The conditional posterior distributions needed are 


o f(alY, X, H, ß, 0o?) = f(@|H, o?): Given H, ln h, follows an AR(1) 
model. Therefore, the result of AR models discussed in the previous two 
sections applies. Specifically, if the prior distribution of œ is multivariate 
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normal with mean a, and covariance matrix C,, then f(a@|H, o?) is 
multivariate normal with mean a, and covariance matrix C, where 


n f 
—9 21% `> Zr INh 
Ce Se | man Zma Cyl) 


oF 5 g 

where z; = (1, Inh,_1)’. 

foly, X, H,B,a) = fo? |, a): Given H and g, we can calculate v, = 
In hy — œo — a; In h;_; for t = 2,...,n. Therefore, if the prior distribution 
of o? is (mA)/o?2 ~ x7, then the conditional posterior distribution of o? is 
an inverted chi-squared distribution with m + n — 1 degrees of freedom; that 
is, 


n 2 
mài +} 2 
2 Xm+n—1: 


Remark. Formula (12.23) is for 1 < t < n, where n is the sample size. For 
the two end data points hı and h,, some modifications are needed. A simple 
approach is to assume that hı is fixed so that the drawing of h; starts with t = 2. 
For t = n, one uses the result In hy, ~ (a9 + œı In An-1, a): Alternatively, one 
can employ a forecast of hn+}ı and a backward prediction of ho and continue to 
apply the formula. Since h,, is the variable of interest, we forecast h,+) by using 
a 2-step-ahead forecast at the forecast origin n — 1. For the model in Eq. (12.21), 
the forecast of hy+1 is 


fin—1(2) = at + 041 (tp + 4) In y-1). 
The backward prediction of ho is based on the time reversibility of the model 
(ln h; — n) = a (ln hi1 — n) + vr, 
where 7 = ao/(1 — a1) and |a;| < 1. The model of the reversed series is 
(In h; — n) = ay (In hii — n) + vy, 


where {v*} is also a Gaussian white noise series with mean zero and variance ož. 
Consequently, the 2-step-backward prediction of ho at time t = 2 is 


ho(—2) = af (In hy — n). 


Remark. Formula (12.23) can also be obtained by using results of a missing 
value in an AR(1) model; see Section 12.6.1. Specifically, assume that In h, is 
missing. For the AR(1) model in Eq. (12.21), this missing value is related to 
In h,_; and In h,,,; for 1 < t < n. From the model, we have 


In hy = Qo + cy In hy) + ar. 


640 MARKOV CHAIN MONTE CARLO METHODS WITH APPLICATIONS 
Define y; = a + a1 y7-1, X; = 1, and b; = —a;. Then we obtain 
ye = xe ln h; + b. (12.24) 
Next, from 
In Ape, = œo + 4 ln hy + ar41, 
we define y+, = ln h41 — Qo, Xr41 = 1, and Dy) = G41 and obtain 
Year = X41 In hipi + bepa. (12.25) 


Now Eqs. (12.24) and (12.25) form a special simple linear regression with two 
observations and an unknown parameter ln h,. Note that b, and b,,; have the 
same distribution because —q, is also N(O, o2). The least-squares estimate of In h, 
is then 
irat H ArtiYii _ XU — a1) + a(n Mn + In h; 1) 
Ho oe a: 1 l+a? i 


which is precisely the conditional mean of In h, given in Eq. (12.23). In addition, 
this estimate is normally distributed with mean In h, and variance o? /A+ a). For- 
mula (12.23) is simply the product of a; ~ NO, h,) and Inh hi~ Nin hy, o% 2 + 

a?)] with the transformation d In h, = h; ! dh,. This regression approach general- 
izes easily to other AR(p) models for In T We use this approach and assume that 
{h,}?_, are fixed for a stochastic volatility AR(p) model. 


Remark. Starting value of h, can be obtained by fitting a volatility model of 
Chapter 3 to the return series. 


Example 12.3. Consider the monthly log returns of the S&P 500 index from 
January 1962 to December 2009 for 575 observations. The returns are computed 
using the first adjusted closing index of each month, that is, the closing index 
of the first trading day of each month. Figure 12.4(a) shows the time plot of the 
log level of the index, whereas Figure 12.4(b) shows the log returns measured in 
percentage. If GARCH models are entertained for the series, we obtain a Gaussian 
GARCH(1,1) model 


r,=0.552+a, a = hiér, 
h, = 0.878 + 0.125a2_, + 0.837h;-1, (12.26) 


where f ratios of the coefficients are all greater than 2.56. The Ljung—Box statistics 
of the standardized residuals and their squared series fail to indicate any model inad- 
equacy. Specifically, we have Q(12) = 10.04(0.61) and 6.14(0.91), respectively, 
for the standardized residuals and their squared series. 
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Figure 12.4 Time plot of monthly S&P 500 index from 1962 to 2009: (a) log level and (b) log return 
in percentage. 


Next, consider the stochastic volatility model 


n= h+Q, a, = hie, 
In h; = go + aq, In hy-1 + vr, (12.27) 


where the v, are iid N(0, oĉ). To implement the Gibbs sampling, we use the prior 
distributions 
10 x 0.1 2 

o 


5 X10> 
v 


u ~ N0, 4), a ~ Næs, diag(0.25, 0.04)], 


where œ, = (0, 0.6)’. For initial parameter values, we use the fitted values of the 
GARCH(1,1) model in Eq. (12.26) for {h;}, that is, ho, = h;, and set œ and o? 
to the least-squares estimate of ln(hor). The initial value of jz is the sample mean 
of the log returns. The volatility h, is drawn by the Griddy Gibbs with 400 grid 
points. The possible range of h, for the jth Gibbs iteration is [71;, n2], where 
mir = 0.6 x max(hj—1,1, hor) and na = 1.4 x min(hj-1,7, Aor), where hj—1,, and 
ho; denote, respectively, the estimate of h, for the (j — 1)th iteration and initial 
value. 

We ran the Gibbs sampling for 2500 iterations but discarded results of the first 
500 iterations. Figure 12.5 shows the density functions of the prior and posterior 
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Figure 12.5 Density functions of prior and posterior distributions of parameters in stochastic volatility 
model for monthly log returns of S&P 500 index. Dashed line denotes prior density and solid line the 
posterior density, which is based on results of Gibbs sampling with 2000 iterations. See text for more 


details. 


distributions of the four coefficient parameters. The prior distributions used are 
relatively noninformative. The posterior distributions are concentrated especially 
for u and o?. Figure 12.6 shows the time plots of fitted volatilities. The upper 
panel shows the posterior mean of h, over the 5000 iterations for each time point, 
whereas the lower panel shows the fitted values of the GARCH(1,1) model in Eq. 
(12.26). The two plots exhibit a similar pattern. 


Parameter 


Mean 
Standard error 


u 


0.409 
0.157 


ao a1 o? 
0.454 0.837 0.086 
0.068 0.025 0.007 


The posterior mean and standard error of the four coefficients are as follows: 


The posterior mean of a is 0.837, confirming strong serial dependence in the 
volatility series. This value is smaller than that obtained by Jacquier, Polson, and 
Rossi (1994) who used daily returns of the S&P 500 index. Finally, we have used 
different initial values, priors, and numbers of iterations for the Gibbs sampler. The 
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Figure 12.6 Time plots of fitted volatilities for monthly log returns of S&P 500 index from 1962 to 
2009. Lower panel shows posterior means of a Gibbs sampler with 2000 iterations. Upper panel shows 
results of a Gaussian GARCH(1,1) model. 


results are stable. Of course, as expected, the results and efficiency of the Griddy 
Gibbs algorithm depend on the specification of the range for hz. 


12.7.2 Multivariate Stochastic Volatility Models 


In this section, we study multivariate stochastic volatility models using the Cholesky 
decomposition of Chapter 10. We focus on the bivariate case, but the methods 
discussed also apply to the higher dimensional case. Based on the Cholesky decom- 
position, the innovation a, of a return series r; is transformed into b, such that 


bit = at, bu = dx — qri,rbit, 


where bz and qg2;,+ can be interpreted as the residual and least-squares estimate of 
the linear regression 


ax = rirAit + by. 


The conditional covariance matrix of a; is parameterized by {g11,1, 822,1} and {q21,r} 


as 
Olt F121 1 0 Sir 0 1 qn 

© on | i to] 12.28 

| Ot 922.1 | | qas 1l | | 0 8221 0 1l ( ) 
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where git = Var(bi;|F;-1) and by, L by. Thus, the quantities of interest are 
Bilt» 822,1 and q21,t. 

A simple bivariate stochastic volatility model for the return r; = (rit, r2)” is as 
follows: 


r, = Po + Bix, + 4,, (12.29) 
In giit = Qio + O41 In Ziit—1 + Vit, i=1, 2, (12.30) 
q21,t = YO + V1921,t-1 + Ur, (12.31) 


where {a;} is a sequence of serially uncorrelated Gaussian random vectors with 
mean zero and conditional covariance matrix XZ; given by Eq. (12.28), Bọ is a 
two-dimensional constant vector, x; denotes the explanatory variables, and {vız}, 
{vz}, and {u;} are three independent Gaussian white noise series such that Var(vj;) 
= oF, and Var(u;) = og. Again log transformation is used in Eq. (12.30) to ensure 
the positiveness of g;; r. 

Let Gi = (8ii1»-- -> 8iin), G = [G1, G2], and Q = (q21,1, ---, q21,n). The 
“traditional” parameters of the model in Eqs. (12.29)-(12.31) are B = (Bo, 1), 
i = (ajo, a1)’, and oÈ, for i = 1, 2, and y = (yo, yı) and a. The augmented 
parameters are Q, G1, and G2. To estimate such a bivariate stochastic volatility 
model via Gibbs sampling, we use results of the univariate model in the previous 
section and two additional conditional posterior distributions. Specifically, we can 
draw random samples of 


1. Bo and B, row by row using the result (12.22) 

2. gıı, using Eq. (12.23) with a; being replaced by ay, 

3. æ; and o$, using exactly the same methods as those of the univariate case 
with a; replaced by aj; 


To draw random samples of œz, o2, and go, we need to compute bz. But 

this is easy because by = ax — q21,+d1; given the augmented parameter vector Q. 

Furthermore, bz; is normally distributed with mean 0 and conditional variance go /. 
It remains to consider the conditional posterior distributions 


fQ. oD, fOQ, y),  flqutlA. G, Q, y, 02), 


where A denotes the collection of a;, which is known if R, X, By, and B, are given. 
Given Q and a, model (12.31) is a simple Gaussian AR(1) model. Therefore, if 
the prior distribution of y is bivariate normal with mean y, and covariance matrix 
D», then the conditional posterior distribution of y is also bivariate normal with 


mean y,, and covariance matrix D,, where 


n , n 
= —2 ZtZ =2 2121, = 
py! = =% v, = D, (>=5 E+ D'ye); 


OW 
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where z; = (1, q21,1+-1)'. Similarly, if the prior distribution of a is (md) Jo? ~ {a 
then the conditional posterior distribution of o7 is 


n 2 
mh + dita Mi 2 
oo ~~ _ 
o2 m-+n—1? 


where us = q21,t — Yo — ¥1921,1-1. Finally, 


f(qauslA, G, O_,,02,y) 
x f (bre | 829.1) f (21112101. Ys Oe) faldar Ys Fe) 
x gy, expl- (ax — 21,1412)” / (2822.1) expl- (gait — ue) Qo?) (12.32) 


where m, = [vol — y1) + y1 (Q211 + Q21rt 1/0 + yf) and o? = o2/( + y7). 
In general, jz, and g? can be obtained by using the results of a missing value in an 
AR(p) process. It turns out that Eq. (12.32) has a closed-form distribution for q21,r. 
Specifically, the first term of Eq. (12.32), which is the conditional distribution of 
q2i.t given g22.r and a;, is normal with mean a; /a,; and variance 822.1 / (a11). The 
second term of the equation is also normal with mean u, and variance øo?. Con- 
sequently, by Result 12.1, the conditional posterior distribution of g2),; is normal 
with mean u, and variance o2, where 
1 plt pmo (HR xue in) 


2 
o? gni o2 o? g2 ay 


where u, is defined in Eq. (12.32). 


Example 12.4. In this example, we study bivariate volatility models for the 
monthly log returns of IBM stock and the S&P 500 index from January 1962 
to December 2009. This is an expanded version of Example 12.3 by adding the 
IBM returns. Figure 12.7 shows the time plots of the two return series. Let r; = 
(IBM,, SP,)’. If time-varying correlation GARCH models with Cholesky decom- 
position of Chapter 10 are entertained, we obtain the model 


rı = By +a, (12.33) 
Sil = Q10 + O11 8111-1 + a247 1, (12.34) 
22,1 = O29 + 225 ,_1, (12.35) 
q21,t = Vo, (12.36) 


where bo; = ax — g21,+d1; and the estimates and their standard errors are given in 
Table 12.2(a). For comparison purpose, we also fit a BEKK(1,1) model and obtain 
Bo = (0.70, 0.54)’ and the coefficient matrices 


0.80 0.07 0.33 1.00 —0.12 
a=| oes 0.01 |: a= | oe alk majou 0.90 i 


where the matrices are defined in Eq. (10.6) of Chapter 10. 
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Figure 12.7 Time plots of monthly log returns of (a) IBM stock and (b) S&P 500 index from 1962 
to 2009. 


For stochastic volatility model, we employ the same mean equation in Eq. 
(12.33) and a stochastic volatility model similar to that in Eqs. (12.34)—(12.36). 
The volatility equations are 

In gi. = &10 +e In giie-1 + Vir, Var(vi1) = Ofy, (12.37) 
In g22, = @29 + 21 In 9227-1 + Vx, Var(v2) = oy, (12.38) 
q21,t = Yo + ur, Var(u;) = oŻ. (12.39) 


The prior distributions used are 


Bio ~ N(0,4), œ; ~ N[(0, 0.7)’, diag(0.25, 0.04)], 


10 x 0.1 5x 0.2 
yo ~ N,1), = ——~ xio — vy X5. 


iv u 


where i = | and 2. These prior distributions are relatively noninformative. We 
obtained the initial values of {g11,, 822,1, 421,1} from the results of the BEKK(1,1) 
model. In addition, we set the values of quantities at t= 1 as given. We then 
ran the Gibbs sampling for 2500 iterations but discarded results of the first 500 
iterations. The random samples of gj;,; were drawn by Griddy Gibbs with 500 grid 
points in the intervals [n; 1+, ni2] where the lower and upper bounds are set by 
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TABLE 12.2 Estimation of Bivariate Volatility Models for Monthly Log Returns of 
IBM Stock and S&P 500 Index from January 1962 to December 2009° 


(a) Bivariate GARCH(1,1) Model With Time-Varying Correlations 
Parameter Bo. Bor 10 1 2 A an YM 


Estimate 0.69 0.49 3.98 0.80 0.12 10.67 0.12 0.37 
Standard error 0.30 0.18 1.22 0.04 0.03 0.53 0.04 0.01 


(b) Stochastic Volatility Model 


2 2 2 
Parameter Bo. Êo O10 œn Fj, 820 1 OF, YM o; 


Posterior mean 0.53 0.51 0.75 0.80 0.07 0.43 0.81 0.07 0.38 0.07 
Standard error 0.26 0.17 0.11 0.03 0.01 0.06 0.03 0.01 0.03 0.01 


“The stochastic volatility models are based on the last 2000 iterations of a Gibbs sampling with 2500 
total iterations. 


the same method as those of Example 12.3. Posterior means and standard errors of 
the “traditional” parameters of the bivariate stochastic volatility model are given 
in Table 12.2(b). 

To check for convergence of the Gibbs sampling, we ran the procedure several 
times with different starting values and numbers of iterations. The results are stable. 
For illustration, Figure 12.8 shows the scatterplots of various quantities for two 
different Gibbs samples. The first Gibbs sample is based on 500 + 2000 iterations, 
and the second Gibbs sample is based on 500 + 1000 iterations, where M + N 
denotes that the total number of Gibbs iterations is M + N, but results of the 
first M iterations are discarded. The scatterplots shown are posterior means of 
Silt, 822.t, 821,t» 022,1, 021,1, and the correlation p2;,,. The line y = x is added to 
each plot to show the closeness of the posterior means. The stability of the Gibbs 
sampling results is clearly seen. 

It is informative to compare the BEKK model and the GARCH model with time- 
varying correlations in Eqs. (12.33)—(12.36) with the stochastic volatility model. 
First, as expected, the mean equations of the three models are essentially iden- 
tical. Second, Figure 12.9 shows the time plots of the conditional variance for 
IBM stock return. Figure 12.9(a) is for the GARCH model, Figure 12.9(b) is from 
the BEKK model, and Figure 12.9(c) shows the posterior mean of the stochas- 
tic volatility model. The three models show similar volatility characteristics; they 
exhibit volatility clustering and indicate an increasing trend in volatility. How- 
ever, the GARCH model produces higher peak volatility values and an additional 
peak in 1993. Third, Figure 12.10 shows the time plots of conditional variance 
for the S&P 500 index return. The GARCH model produces an extra volatility 
peak around 1993. This additional peak does not appear in the univariate analy- 
sis shown in Figure 12.6. It seems that for this particular instance the bivariate 
GARCH model produces a spurious volatility peak. This spurious peak is induced 
by its dependence on IBM returns and does not appear in the stochastic volatility 
model or the BEKK model. Indeed, the fitted volatilities of the S&P 500 index 
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Figure 12.8 Scatterplots of posterior means of various statistics of two different Gibbs samples for 
bivariate stochastic volatility model for monthly log returns of IBM stock and S&P 500 index. The x 
axis denotes results based on 500 + 2000 iterations and the y axis denotes results based on 500 + 1000 
iterations. Notation is defined in text. 


return by the bivariate stochastic volatility model are similar to that of the uni- 
variate analysis. Fourth, Figure 12.11 shows the time plots of fitted conditional 
correlations. Here the three models differ substantially. The correlations of the 
GARCH model with Cholesky decomposition are relatively smooth and always 
positive with mean value 0.59 and standard deviation 0.07. The range of the cor- 
relations is (0.411,0.849). The correlations of the BEKK(1,1) model assume small 
negative values around 1993 and are more variable with mean 0.59, standard devi- 
ation 0.13 and range (—0.020, 0.877). However, the correlations produced by the 
stochastic volatility model vary markedly from one month to another with mean 
value 0.60, standard deviation 0.14, and range (—0.161, 0.839). Furthermore, the 
negative correlations occur in several isolated periods. The difference is under- 
standable because q21,, contains the random shock u, in the stochastic volatility 
model. 


Remark. The Gibbs sampling estimation applies to other bivariate stochas- 
tic volatility models. The conditional posterior distributions needed require some 
extensions of those discussed in this section, but they are based on the same ideas. 
The BEKK model is estimated by using Matlab. 
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Figure 12.9 Time plots of fitted conditional variance for monthly log returns of IBM stock from 1962 
to 2009: (a) GARCH model with time-varying correlations, (b) BEKK(1,1) model, and (c) bivariate 
stochastic volatility model estimated by Gibbs sampling with 500 + 2000 iterations. 


12.8 NEW APPROACH TO SV ESTIMATION 


In this section, we discuss an alternative procedure to estimate stochastic volatility 
(SV) models. This approach makes use of the technique of forward filtering and 
backward sampling (FFBS) within the Kalman filter framework to improve the 
efficiency of Gibbs sampling. It can dramatically reduce the computing time by 
drawing the volatility process jointly with the help of a mixture of normal distri- 
butions. In fact, the approach can be used to estimate many stochastic diffusion 
models with leverage effects and jumps. 

For ease in presentation, we reparameterize the univariate stochastic volatility 
model in Eqs. (12.20) and (12.21) as 


ry = xB + ooexp (=) ee (12.40) 


Z1 =z, + Mr, (12.41) 
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Figure 12.10 Time plots of conditional variance for monthly log returns of S&P 500 index from 1962 
to 2009: (a) GARCH model with time-varying correlations, (b) BEKK(1,1) model, and (c) bivariate 
stochastic volatility model estimated by Gibbs sampling with 500 + 2000 iterations. 


where x, = (l, Xit, «++» pr)’, B = (Bo. Bis -<+ Bp)’, 40 >0, {zi} is a zero-mean 
log volatility series, and {e€;} and {ņ;} are bivariate normal distributions with mean 
zero and covariance matrix 
1 O, 
2 = P 2 n 
POn o) 


The parameter p is the correlation between e, and n; and represents the leverage 
effect of the asset return r;. Typically, p is negative signifying that a negative return 
tends to increase the volatility of an asset price. 

Compared with the model in Eqs. (12.22) and (12.20), we have z, = 
In(h;) — In(oĝ) and oe = exp{E[In(h,)]}. That is, z; is a mean-adjusted log volatil- 
ity series. This new parameterization has some nice characteristics. For example, 
the volatility series is oo exp(z;/2), which is always positive. More importantly, n; 
is the innovation of z;;; and is independent of z,. This simple time shift enables 
us to handle the leverage effect. If one postulates z; = az;-; + n: for Eq. (12.41), 
then 7, and €e; cannot be correlated because a nonzero correlation implies that zz 
and €, are correlated in Eq. (12.40), which would lead to some identifiability issues. 
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Figure 12.11 Time plots of fitted correlation coefficients between monthly log returns of IBM stock 
and S&P 500 index from 1962 to 2009: (a) GARCH model with time-varying correlations, (b) 
BEKK(1,1) model, and (c) bivariate stochastic volatility model estimated by Gibbs sampling with 
500 + 2000 iterations. 


Remark. Alternatively, one can write the stochastic volatility model as 
Zt—1 
ri = x, B + 00 exp (=) Er, 
Zt = QZt-1 + Mr, 


where (€;, 7;)’ is a bivariate normal distribution as before. Yet another equivalent 
parameterization is 


oie 
r, = x/B + exp (=) Et, 


Zz = œo + zš + m, 


where E(z*) = œo/(1 — @) is not zero. 


Parameters of the stochastic volatility model in Eqs. (12.40) and (12.41) are 
B, 00, &, P, On, and Z = (Z,..., Zn)’, where n is the sample size. For simplicity, we 
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assume zı is known. To estimate these parameters via MCMC methods, we need 
their conditional posterior distributions. In what follows, we discuss the needed 
conditional posterior distributions. 


1. Given z and og and a normal prior distribution, B has the same conditional 
posterior distribution as that in Section 12.7.1 with v/h; replaced by oo exp(z,/2); 
see Eq. (12.22). 

2. Given z and OF. a is a simple AR(1) coefficient. Thus, with an approximate 
normal prior, the conditional posterior distribution of œ is readily available; see 
Section 12.7.1. 

3. Given £ and z, we define v; = (r; — xB) exp(—z;/2) = ove. Thus, {v;} 
is a sequence of iid normal random variables with mean zero and variance oê. 
If the prior distribution of ae is (mA) Joe cy x; then the conditional posterior 
distribution of oO is an inverted chi-squared distribution with m +n degrees of 
freedom; that is, 


n 2 
mht vrai 2 


9 Xm+n' 
% 


4. Given B, oo, z, and a, we can easily obtain the bivariate innovation b; = 
(€r, 7)’ for t = 2, ..., n. The likelihood function of (p, op) is readily available as 


n 1 n i 
Lp, 07) = | [ S018) x [BPP exp (-; 2 nza) 


t=2 t=2 


1 n 
œ ||P? exp |- (= = ot) , 
t=2 


where tr(A) denotes trace of the matrix A. However, this joint distribution is 
complicated because one cannot separate p and oe We adopt the technique of 
Jacquier, Polson, and Rossi (2004) and reparameterize the covariance matrix as 


1 poy 1 Q 
Z= 2 = 2 ’ 
PO) Oh p wre 


where w = op (1 — p°). It is easy to see that |E| = w and 


1 2 = 1 0 1 1 0 
-1_i| # p = 2 
z -i AEE: | as+| 4 o, 
where S contains g only. Let e = (€2,..., €n) and 9 = (72,..., Nn be the inno- 


vations of the model in Eqs. (12.40) and (12.41). The likelihood function then 
becomes (keeping terms related to parameters only) 


1 
L(y, w) x wD? exp |-> us] , 
w 
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where R = $; b,b, = (e, n)' (e, n), which is the 2 x 2 cross-product matrix of 
the innovations. For simplicity, we use conjugate priors such that œ is inverse 
gamma (IG) with hyperparameters (yo/2, 7/2); that is, œ ~ IG(yo/2, 71/2), and 
g|w ~ N(O, @/2). Then, after some algebraic manipulation, the joint posterior 
distribution of (g,@) can be decomposed into a normal and an inverse gamma 
distribution. Specifically, 


es N («. se) > 
(2+ e’e) 


where @ = e’n/(2 + e'e), and 


1 1 en)? 
o~1G[ sm +i +w, (n +n- 3), 


2+ ee 


In Gibbs sampling, once g and œw are available, we can obtain p and a easily 
because a, = w+? and p = ọ /o,. Note that the probability density function of 
an IG(«, 6) random variable w is 


fola, B) = BP yet) exp e: , for w > 0, 
l(a) w 
where a > 2 and £ > 0. 


5. Finally, we consider the joint distribution of the log volatility z given the 
data and other parameters. From Eq. (12.40), we have 


(ri — x, B) 
<a = exp(z;)e?. 


% 
Therefore, letting y; = ln[(r; — BY Joel, we obtain 


w= ate, (12.42) 


where e¥ = In(€?). Since e? ~ x?, €* is not normally distributed. Treating Eq. 
(12.42) as an observation equation and Eq. (12.40) as the state equation, we have 
the form of a state-space model except that €** is not Gaussian; see Eqs. (11.26) and 
(11.27). To overcome the difficulty associated with nonnormality, Kim, Shephard, 
and Chib (1998) use a mixture of seven normal distributions to approximate the 
distribution of ež. Specifically, we have 


p 
FED ~ J PiN (ui, 0), 
i=1 


where pi, /4;, and w? are given in Table 12.3. See also Chib, Nardari, and Shephard 
(2002). 
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TABLE 12.3 Seven Components of Normal Distributions 


Component i Probability p; Mean 4; var. ©? 
1 0.00730 —11.4004 5.7960 
2 0.10556 —5.2432 2.6137 
3 0.00002 —9.8373 5.1795 
4 0.04395 1.5075 0.1674 
5 0.34001 —0.6510 0.6401 
6 0.24566 0.5248 0.3402 
T 0.25750 —2.3586 1.2626 
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Figure 12.12 Density functions of log( XP), solid line, and that of a mixture of seven normal distribu- 
tions, dashed line. Results are based on 100,000 observations. 


To demonstrate the adequacy of the approximation, Figure 12.12 shows the 
density function of ež (solid line) and that of the mixture of seven normals (dashed 
line) in Table 12.3. These densities are obtained using simulations with 100,000 
observations. From the plot, the approximation by the mixture of seven normals is 
very good. 

Why is it important to have a Gaussian state-space model? The answer is that 
such a Gaussian model enables us to draw the log volatility series z jointly and 
efficiently. To see this, consider the following special Gaussian state-space model, 
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where 7; and e; are uncorrelated (i.e., no leverage effects): 


Ziti =a%+m, h ~ia NCO, op), (12.43) 
Ye = Cr tH Zt t lr et ~ina. N (0, H;), (12.44) 
where, as will be seen later, (c;, H;) assumes the value (ui, w?) of Table 12.3 for 
some i. For this special state-space model, we have the Kalman filter algorithm 
Vt = Yt — Yelt—1 = Yt — Cr — Ztjt-1; 
Vi = Xi- + Hr, 
Zilte = Zie-1 + Ere- V vy, (12.45) 
Eie = Lyye—1 — Lepr V’ Laye—ts 


Zr+ lt = Cr Ir, 


2 2 
Xr+ =Q Xr +F Oy» 


where V; = Var(v;) is the variance of the |-step-ahead prediction error v; of yr 
given Fy) = (y1,-.-, Yr—1), and zj; and Xj); are, respectively, the conditional 
expectation and variance of the state variable z; given F;. See the Kalman filter 
discussion of Chapter 11. 


Forward Filtering and Backward Sampling 

Let p(z|F,,) be the joint conditional posterior distribution of z given the return 
data and other parameters, where for simplicity the parameters are omitted from 
the condition set. We can partition the distribution as 


P(Z|Fn) = P(Z2, 23,--+5 Znl Fn) 
= P(Zn|Fn) Pn-11Zn, Fy) pn—2|Zn-1, Zn, Fa) eer’ P(22(Z3, <.. Zny Fa) 


= Pl Fn) PC@n-112ns Fn) Pn—2|2n-1, Fn) +++ polza, Fn), (12.46) 
where the last equality holds because z; in Eq. (12.43) is a Markov process so that 
conditioned on z;41, Z; is independent of z+; for j > 1. 

From the Kalman filter in Eq. (12.45), we obtain that p(z,|F,) is normal with 


mean Z,|, and variance X,|,. Next, consider the second term p(Z;—1|Zn, Fn) of Eq. 
(12.46). We have 


P(Zn—1|Zn5 Fy) = P(Zn—1\Zn, Prats Yn) = D(Zn-1 IZn, Fa-1, Un), (12.47) 


where vn = Yn — Yn|n—1 18 the l-step-ahead prediction error of y,. From the state- 
space model in Eqs. (12.43) and (12.44), z,_; is independent of v,. Therefore, 


P(Zn—1|Zn, Fa) = P(Zn—1|Zn, Fy-1)- (12.48) 
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This is an important property because it implies that we can derive the poste- 
rior distribution p(Zn—1|Zn, Fn) from the joint distribution of (Zn—1, Zn) given Fy—1 
via Theorem 11.1. First, the joint distribution is bivariate normal under the Gaus- 
sian assumption. Second, the conditional mean and covariance matrix of (Zn—1, Zn) 
given F,,_; are readily available from the Kalman filter algorithm in Eq. (12.45). 
Specifically, we have 


Zn—1 Zn—1|n—1 Xn—ijn-1 Q Xn—ijn-1 
~N ; 3 12.49 
| Zn Ie, (| Zn\n—1 | | OXn—1\n—1 Znln—1 1) ( ) 
where the covariance is obtained by (i) multiplying zn-1 by Eq. (12.43) and (ii) 


taking conditional expectation. Note that all quantities involved in Eq. (12.49) are 
available from the Kalman filter. Consequently, by Theorem 11.1, we have 


PCn-1|fn, Fa) ~ N (uhi Er) (12.50) 
where 
My—1 = Zn—1n—1 + æ Enin- Epp n — Zn\n—1)s 
Di1 = Enin- — a’ Ezin- Ean- 


Next, for the conditional posterior distribution p(Zn—-2|Zn-1, Fn), we have 


P(Zn-2|Zn-1, Fy) = p(2n 2\Zn l; Fan 2, Yn—1; Yn) 


= Pn 2|Zn is Fnp—2, Un—15 Un) 


= P(Zn-2 |Zn—1 ; Fa-2). 


Consequently, we can obtain p(Zp—2|Zn—1, Fn) from the bivariate normal distribu- 
tion of p(Zn—2, Zn—1|Fn—2) as before. In general, we have 


P(ZlZt41, Fa) = Prlz Fe), frl<t<n. 


Furthermore, from the Kalman filter, p(z;, z++1|F;) is bivariate normal as 
Pon Ha S) e 
Zt+1 F, Zt+1|t adit Xii 


P(ZlZr41, Fe) ~ N (u7, =), 


Consequently, 


where 


-1 
Hy = Zit + A Dii Uy yp Er = Zt t)s 


* __ 2y2 s-1 
Lp = Xira Dir Yie 


NEW APPROACH TO SV ESTIMATION 657 


The prior derivation implies that we can draw the volatility series z jointly 
by a recursive method using quantities readily available from the Kalman filter 
algorithm. That is, given the initial values zıjọ and X49, one uses the Kalman 
filter in Eq. (12.45) to process the return data forward, then applies the recursive 
backward method to draw a realization of the volatility series z. This scheme is 
referred to as forward filtering and backward sampling (FFBS); see Carter and 
Kohn (1994) and Friihwirth-Schnatter (1994). Because the volatility {z+} is serially 
correlated, drawing the series jointly is more efficient. 


Remark. The FFBS procedure applies to general linear Gaussian state-space 
models. The main idea is to make use of the Markov property of the model and 
the structure of the state transition equation so that 


PCS: Sii, Fn) = PCS: |Sr41, Fr, Urq, os Un) = PSl Sii, Fo), 


where S, denotes the state variable at time f and v; is the 1-step-ahead prediction 
error. This identity enables us to apply Theorem 11.1 to derive a recursive method 
to draw the state vectors jointly. 


Return to the estimation of the SV model. As in Eq. (12.42), let y; = ln[ (r; — 
x BY joc. To implement FFBS, one must determine c; and H, of Eq. (12.44) 
so that the mixture of normals provides a good approximation to the distribution 
of ež. To this end, we augment the model with a series of independent indicator 
variables {J;}, where J; assumes a value in {1,..., 7} such that P(J; = i) = pir 
with ee | Pit = 1 for each ż. In practice, conditioned on {z;}, we can determine 
cı and H, as follows. Let 


qit = PLO — zı — Mi)/Wi), ford = beers Ls 


where u; and w; are the mean and standard error of the normal distributions given 
in Table 12.3 and ®(-) denotes the cumulative distribution function of the standard 
normal random variable. These probabilities g;; are the likelihood function of I, 
given y; and z;. The probabilities p; of Table 12.3 form a prior distribution of J/;. 
Therefore, the posterior distribution of I, is 
pr=—e fehe? 
È j= PiVit 


We can draw a realization of J; using this posterior distribution. If the random 
draw is J; = j, then we define c; = uj and H, = w7. In summary, conditioned 
on the return data and other parameters of the model, we employ the approximate 
linear Gaussian state-space model in Eqs. (12.43) and (12.44) to draw jointly the 
log volatility series z. It turns out that the resulting Gibbs sampling is efficient in 
estimating univariate stochastic volatility models. 
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On the other hand, the square transformation involved in Eq. (12.42) fails to 
retain the correlation between n; and é; if it exists, making the approximate state- 
space model in Eqs. (12.43) and (12.44) incapable of estimating the leverage effect. 
To overcome this inadequacy, Artigas and Tsay (2004) propose using a time-varying 
state-space model that maintains the leverage effect. Specifically, when p 4 0, we 
have 


Nie = Poner +77, 


where 7* is a normal random variable independent of e, and Var(nž) = op (1 — p°). 
The state transition equation of Eq. (12.43) then becomes 


Zt+1 = OZ + POner + nF. 


Substituting €; = (1/00)(r;, — x, 8) exp(—z,/2), we obtain 


On(r; — x! =z 
Ze = AZ, + Ponye nC 1B) exp (=) + n7 
00 2 


= G(z,)+ nF (12.52) 


where G(z;) = az; + poy (r; — x, B) exp(—Z:/2)/o0. This is a nonlinear transition 
equation for the state variable z,. The Kalman filter in Eq. (12.45) is no longer 
applicable. To overcome this difficulty, Artigas and Tsay (2004) use a time-varying 
linear Kalman filter to approximate the system. Specifically, the last two equations 
of Eq. (12.45) are modified as 


Zt+1|\t = G (zt), 
Dirije = (Zee)? Eie +07 — p°), (12.53) 


where g(zrr) = 0G(x)/dx|x 
the smoothed state zzz. 


is the first-order derivative of G (z+) evaluated at 


=Zt]t 


Example 12.5. To demonstrate the FFBS procedure, we consider the monthly 
log returns of the S&P 500 index from January 1962 to November 2004 for 515 
observations. This is a subseries of the data used in Example 12.3. See Figure 12.4 
for time plots of the index and its log return. We consider two stochastic volatility 
models in the form: 


Ft = U + Oo exp(Zr/2)€r, €r ~iia N(O, 1), (12.54) 
Zt+1 =O +m, Ne ~iia N (O, o3). 


In model 1, {e;} and {n;} are two independent Gaussian white noise series. That 
is, there is no leverage effect in the model. In model 2, we assume that corr(é;, er) 
= p, which denotes the leverage effect. 
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TABLE 12.4 Estimation of Stochastic Volatility Model in Eq. (12.54) for Monthly 
Log Returns of S&P 500 Index from January 1962 to November 2004 Using Gibbs 
Sampling with FFBS Algorithm“ 


Parameter H Oo a On p 
With Leverage Effect 
Estimate 0.0081 0.0764 —0.0616 2.5639 —0.3892 
Standard error 0.0274 0.0255 0.1186 0.3924 0.0292 
Without Leverage Effect 
Estimate 0.0080 0.0775 —0.0613 2.5827 
Standard error 0.0279 0.0266 0.1164 0.3783 


“The results are based on 2000+8000 iterations with the first 2000 iterations as burn-ins. 


We estimate the models via the FFBS procedure using a program written in 
Matlab. The Gibbs sampling was run for 2000+8000 iterations with the first 2000 
iterations as burn-ins. Table 12.4 gives the posterior means and standard errors of 
the parameter estimates. In particular, we have 6 = —0.39, which is close to the 
value commonly seen in the literature. Figure 12.13 shows the time plots of the 
posterior means of the estimated volatility. As expected, the two volatility series 
are very close. Compared with the results of Example 12.3, which uses a shorter 
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Figure 12.13 Estimated volatility of monthly log returns of S&P 500 index from January 1962 to 


November 2004 using stochastic volatility models: (a) with leverage effect and (b) without leverage 
effect. 
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series, the estimated volatility series exhibit similar patterns and are in the same 
magnitude. Note that the volatility shown in Figure 12.6 is conditional variance 
of percentage log returns whereas the volatility in Figure 12.13 is the conditional 
standard error of log returns. 


12.9 MARKOV SWITCHING MODELS 


The Markov switching model is another econometric model for which MCMC 
methods enjoy many advantages over the traditional likelihood method. McCulloch 
and Tsay (1994b) discuss a Gibbs sampling procedure to estimate such a model 
when the volatility in each state is constant over time. These authors applied the 
procedure to estimate a Markov switching model with different dynamics and mean 
levels for different states to the quarterly growth rate of U.S. real gross national 
product, seasonally adjusted, and obtained some interesting results. For instance, the 
dynamics of the growth rate are significantly different between periods of economic 
“contraction” and “expansion.” Since this chapter is concerned with asset returns, 
we focus on models with volatility switching. 

Suppose that an asset return r; follows a simple two-state Markov switching 
model with different risk premiums and different GARCH dynamics: 


Bishi + vhe hy = œ +h) Hana? ifs, =1, 
n= (12.55) 
Bashi t+ vhi, h; = a2 + ah; tana? , if s, =2, 


where a; = J/h;é;, {e;} is a sequence of Gaussian white noises with mean zero 
and variance 1, and the parameters a;; satisfy some regularity conditions so that 
the unconditional variance of a, exists. The probability transition from one state to 
another is governed by 


P(s; = 2|%:-1 = 1) = 1, P(s; = 1|s;-1 = 2) = e2, (12.56) 


where 0 < e; < 1. A small e; means that the return series has a tendency to stay 
in the ith state with expected duration |/e;. For the model in Eq. (12.55) to be 
identifiable, we assume that 62 > 6; so that state 2 is associated with higher risk 
premium. This is not a critical restriction because it is used to achieve uniqueness 
in labeling the states. A special case of the model results if a}; = aj; for all j 
so that the model assumes a GARCH model for all states. However, if £;./h; is 
replaced by f;, then model (12.55) reduces to a simple Markov switching GARCH 
model. 

Model (12.55) is a Markov switching GARCH-M model. For simplicity, we 
assume that the initial volatility hı is given with value equal to the sample variance 
of r;. A more sophisticated analysis is to treat hı as a parameter and estimate it 
jointly with other parameters. We expect the effect of fixing hı will be negligible 
in most applications, especially when the sample size is large. The “traditional” 
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parameters of the Markov switching GARCH-M model are B = (b1, 62)’, œi = 
(&io, Œil, 0:2)’ for i = 1 and 2, and the transition probabilities e = (e1, e2)’. The 
state vector S = ($1, 52,...,5,)’ contains the augmented parameters. The volatility 
vector H = (h2,...,h,)' can be computed recursively if hı, œ;, and the state 
vector S$ are given. 

Dependence of the return on volatility in model (12.55) implies that the return 
is also serially correlated. The model thus has some predictability in the return. 
However, states of the future returns are unknown and a prediction produced by 
the model is necessarily a mixture of those over possible state configurations. This 
often results in high uncertainty in point prediction of future returns. 

Turn to estimation. The likelihood function of model (12.55) is complicated as it 
is a mixture over all possible state configurations. Yet the Gibbs sampling approach 
only requires the following conditional posterior distributions: 


f (BIR, S, H, œi, «2), f(ai|R, S, H, &j i), 
P(S|R, hy, 1, 2), fei|S), i=1,2, 


where R is the collection of observed returns. For simplicity, we use conjugate 
prior distributions discussed in Section 12.3—that is, 


Bi ~ N (Bio, 06), ei ~ Beta(yi1, yi2). 


The prior distribution of parameter a;; is uniform over a properly specified interval. 
Since a; is a nonlinear parameter of the likelihood function, we use the Griddy 
Gibbs to draw its random realizations. A uniform prior distribution simplifies the 
computation involved. Details of the prior conditional posterior distributions follow: 


1. The posterior distribution of 6; only depends on the data in state i. Define 


r | ri/ hr if s =i, 
it = 


0 otherwise. 


Then we have 
rin = Bi +6, for =L 


Therefore, information of the data on £; is contained in the sample mean of 7;;. Let 
r= Oszi rit)/ni, where the summation is over all data points in state į and n; 
is the number of data points in state i. Then the conditional posterior distribution 


of 6; is normal with mean f;* and variance on, where 


=nt+—s, piso, (nifi + Bio/og), i=1,2. 
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2. Next, the parameters @;; can be drawn one by one using the Griddy Gibbs 
method. Given hj, S, a 4;, and &iy with v Æ j, the conditional posterior distribu- 
tion function of a;; does not correspond to a well-known distribution, but it can 
be evaluated easily as 


2 

fajl) « [m ie Ea | > if =i, 
2 hy 

where h; contains a;;. We evaluate this function at a grid of points for a;; over a 

properly specified interval. For example, 0 < aj; < 1 — &12. 

3. The conditional posterior distribution of e; only involves S. Let £; be the 
number of switches from state 1 to state 2 and £, be the number of switches from 
state 2 to state 1 in S. Also, let n; be the number of data points in state i. Then 
by Result 12.3 of conjugate prior distributions, the posterior distribution of e; is 
Beta(yj1 + ĉi, Yi2 +i — ti). 

4. Finally, elements of S can be drawn one by one. Let S_; be the vector 
obtained by removing sj from S. Given S_; and other information, s; can assume 
two possibilities (i.e., s; = 1 or sj; = 2), and its conditional posterior distribution is 


P(s;l.) «| | f(@lH)P(;|S_)). 


1=j 
The probability 
PS; =i|S-;)= PG; = isi, 8 4), i=1,2 


can be computed by the Markov transition probabilities in Eq. (12.56). In addition, 
assuming sj = i, one can compute h; for t > j recursively. The relevant likelihood 
function, denoted by L(s;), is given by 


n 


n 2: 
L(sj =i) = | [fG@IA) xexp(fj), fi=} -5 [mo + si| ; 


t=j t=j 


for i = 1 and 2, where a, = r; — B,/h, if s; = 1 and a, = r; — Boh; otherwise. 
Consequently, the conditional posterior probability of s; = 1 is 


P(s; = I]sj-1, Sj4)L(s; = 1) 


Poa) = Es 
P(s; = Vsj-1, 5741) L (5; = 1) + P (sj = 2[5;-1, 5741) L (5; = 2) 


The state s; can then be drawn easily using a uniform distribution on the unit 
interval [0, 1]. 
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Remark. Since sj and s;+<; are highly correlated when e; and ez are small, it 
is more efficient to draw several s; jointly. However, the computation involved in 
enumerating the possible state configurations increases quickly with the number of 
states drawn jointly. 


Example 12.6. In this example, we consider the monthly log stock returns of 
General Electric Company from January 1926 to December 1999 for 888 observa- 
tions. The returns are in percentages and shown in Figure 12.14(a). For comparison 
purposes, we start with a GARCH-M model for the series and obtain 


r, =0.182/h; +a, a = She, 


h, = 0.546 + 1.740h;—1 — 0.775h;~2 + 0.025a?_,, (12.57) 


where r, is the monthly log return and {e;} is a sequence of independent Gaussian 
white noises with mean zero and variance 1. All parameter estimates are highly 
significant with p values less than 0.0006. The Ljung—Box statistics of the stan- 
dardized residuals and their squared series fail to suggest any model inadequacy. It 
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Figure 12.14 (a) Time plot of monthly log returns, in percentages, of GE stock from 1926 to 1999. 
(b) Time plot of the posterior probability of being in state 2 based on results of last 2000 iterations 
of Gibbs sampling with 5000 + 2000 total iterations. Model used is two-state Markov switching 
GARCH-M model. 
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is reassuring to see that the risk premium is positive and significant. The GARCH 
model in Eq. (12.57) can be written as 


(1 — 1.765B + 0.775B*)a? = 0.546 + (1 — 0.025B)n,, 


where 7; = a? — h, and B is the back-shift operator such that Ba? = Gaje As 
discussed in Chapter 3, the prior equation can be regarded as an ARMA(2,1) model 
with nonhomogeneous innovations for the squared series a?. The AR polynomial 
can be factorized as (1 — 0.945B)(1 — 0.820 B), indicating two real characteristic 
roots with magnitudes less than 1. Consequently, the unconditional variance of r; 
is finite and equal to 0.546/(1 — 1.765 + 0.775) ~ 49.64. 


Turn to Markov switching models. We use the following prior distributions: 
Bı ~ N (0.3, 0.09), Bo ~ N(1.3, 0.09), c; ~ Beta(5, 95). 


The initial parameter values used are (a) e; = 0.1, (b) sı is a Bernoulli trial with 
equal probabilities and s, is generated sequentially using the initial transition prob- 
abilities, and (c) œ = (1.0, 0.6, 0.2)’ and a2 = (2, 0.7, 0.1)’. Gibbs samples of «;; 
are drawn using the Griddy Gibbs with 400 grid points, equally spaced over the 
following ranges: ajo € [0, 6.0], aj € [0, 1], and aj2 € [0, 0.5]. In addition, we 
implement the constraints aj; + a@j2 < 1 for i = 1, 2. The Gibbs sampler is run 
for 5000 + 2000 iterations, but only results of the last 2000 iterations are used to 
make inference. 

Table 12.5 shows the posterior means and standard deviations of parameters 
of the Markov switching GARCH-M model in Eq. (12.55). In particular, it also 
contains some statistics showing the difference between the two states such as 
0 = Bo — pı. The difference between the risk premiums is statistically significant 
at the 5% level. The differences in posterior means of the volatility parameters 
between the two states appear to be insignificant. Yet the posterior distributions of 
volatility parameters show some different characteristics. Figures 12.15 and 12.16 
show the histograms of all parameters in the Markov switching GARCH-M model. 
They exhibit some differences between the two states. Figure 12.17 shows the 
time plot of the persistent parameter a;; + &œ;2 for the two states. It shows that the 
persistent parameter of state | reaches the boundary 1.0 frequently, but that of state 
2 does not. The expected durations of the two states are about 11 and 9 months, 
respectively. Figure 12.14(b) shows the posterior probability of being in state 2 for 
each observation. 

Finally, we compare the fitted volatility series of the simple GARCH-M model 
in Eq. (12.57) and the Markov switching GARCH-M model in Eq. (12.55). The two 
fitted volatility series (Figure 12.18) show similar patterns and are consistent with 
the behavior of the squared log returns. The simple GARCH-M model produces a 
smoother volatility series with lower estimated volatilities. 
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TABLE 12.5 Fitted Markov Switching GARCH-M Model for Monthly Log Returns 
of GE Stock from January 1926 to December 19997 


State 1 
Parameter By e &10 1) Q12 
Posterior mean 0.111 0.089 2.070 0.844 0.033 
Posterior standard error 0.043 0.012 1.001 0.038 0.033 
State 2 
Parameter Bo e 020 21 O22 
Posterior mean 0.247 0.112 2.740 0.869 0.068 
Posterior standard Error 0.050 0.014 1.073 0.031 0.024 
Difference Between States 
Parameter Bo — Bi é= êj 29 — 10 a2] — Q11 O22 — 12 
Posterior mean 0.135 0.023 0.670 0.026 —0.064 
Posterior standard error 0.063 0.019 1.608 0.050 0.043 


“The numbers shown are the posterior means and standard deviations based on a Gibbs sampling with 
5000 + 2000 iterations. Results of the first 5000 iterations are discarded. The prior distributions and 
initial parameter estimates are given in the text. 
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Figure 12.15 Histograms of risk premium and transition probabilities of a two-state Markov switching 
GARCH-M model for monthly log returns of GE stock from 1926 to 1999. Results based on last 2000 
iterations of Gibbs sampling with 5000 + 2000 total iterations. 
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Figure 12.16 Histograms of volatility parameters of two-state Markov switching GARCH-M model 
for monthly log returns of GE stock from 1926 to 1999. Results based on last 2000 iterations of Gibbs 
sampling with 5000 + 2000 total iterations. 


12.10 FORECASTING 


Forecasting under the MCMC framework can be done easily. The procedure is 
simply to use the fitted model in each Gibbs iteration to generate samples for 
the forecasting period. In a sense, forecasting here is done by using the fitted 
model to simulate realizations for the forecasting period. We use the univariate 
stochastic volatility model to illustrate the procedure; forecasts of other models 
can be obtained by the same method. 

Consider the stochastic volatility model in Eqs. (12.20) and (12.21). Suppose that 
there are n returns available and we are interested in predicting the return 7,4; and 
volatility n+; fori = 1,..., £, where £ > 0. Assume that the explanatory variables 
xj in Eq. (12.20) are either available or can be predicted sequentially during the 
forecasting period. Recall that estimation of the model under the MCMC framework 
is done by Gibbs sampling, which draws parameter values from their conditional 
posterior distributions iteratively. Denote the parameters by $; = (Bo,j,.--» Bp, ays 
æj = (a0, ;,01,;)', and of j for the jth Gibbs iteration. In other words, at the jth 
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Figure 12.17 Time plots of persistent parameter œ;1 + &;2 of two-state Markov switching GARCH-M 
model for monthly log returns of GE stock from 1926 to 1999. Results based on last 2000 iterations of 
Gibbs sampling with 5000 + 2000 total iterations. 


Gibbs iteration, the model is 


Ti = Bo,j + Bi jxir +°- + Bp,jXpt + drt, (12.58) 
In h; = œo,j +.1,; In hi1 + vs, Var(v;) = oy, j. (12.59) 
We can use this model to generate a realization of rn}; and hn; fori = 1,..., £. 


Denote the simulated realizations by r,+;,; and An4i, j, respectively. These realiza- 
tions are generated as follows: 


e Draw a random sample v,+; from N (0, o? p) and use Eq. (12.59) to compute 
hn+1,j- 

e Draw a random sample ¢€,,, from N (0, 1) to obtain an+1,j = J//n41,j€n41 
and use Eq. (12.58) to compute ry+1, ;. 

e Repeat the prior two steps sequentially for n + i with i =2,..., £. 


If we run a Gibbs sampling for M + N iterations in model estimation, we only 
need to compute the forecasts for the last N iterations. This results in a random 
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Figure 12.18 Fitted volatility series for monthly log returns of GE stock from 1926 to 1999: (a) squared 
log returns, (b) GARCH-M model in Eq. (12.59), and (c) two-state Markov switching GARCH-M model 
in Eq. (12.57). 


sample for ra+; and h,;. More specifically, we obtain 


N N 
{Tr+1,j sey Tnttjhjat {An4i, js sees An+e,j}jar- 


These two random samples can be used to make inference. For example, point 
forecasts of the return r+; and volatility h,4+4; are simply the sample means of the 
two random samples. Similarly, the sample standard deviations can be used as the 
variances of forecast errors. To improve the computational efficiency in volatility 
forecast, importance sampling can be used; see Gelman, Carlin, Stern, and Rubin 
(2003). 


Example 12.7. (Example 12.3 continued) As a demonstration, we consider the 
monthly log return series of the S&P 500 index from 1962 to 1999. Table 12.6 
gives the point forecasts of the return and its volatility for five forecast horizons 
starting with December 1999. Both the GARCH model in Eq. (12.26) and the 
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TABLE 12.6 Volatility Forecasts for Monthly Log Return of S&P 500 Index“ 


Horizon 1 2 3 4 5 
Log Return 
GARCH 0.66 0.66 0.66 0.66 0.66 
SVM 0.53 0.78 0.92 0.88 0.84 
Volatility 
GARCH 17.98 18.12 18.24 18.34 18.42 
SVM 19.31 19.36 19.35 19.65 20.13 


“The data span is from January 1962 to December 1999 and the forecast origin is December 1999. 
Forecasts of the stochastic volatility model are obtained by a Gibbs sampling with 2000 + 2000 
iterations. 


stochastic volatility model in Eq. (12.27) are used in the forecasting. The volatility 
forecasts of the GARCH(1,1) model increase gradually with the forecast horizon 
to the unconditional variance 3.349/(1 — 0.086 — 0.735) = 18.78. The volatility 
forecasts of the stochastic volatility model are higher than those of the GARCH 
model. This is understandable because the stochastic volatility model takes into 
consideration the parameter uncertainty in producing forecasts. In contrast, the 
GARCH model assumes that the parameters are fixed and given in Eq. (12.26). 
This is an important difference and is one of the reasons that GARCH models tend 
to underestimate the volatility in comparison with the implied volatility obtained 
from derivative pricing. 


Remark. Besides the advantage of taking into consideration parameter uncer- 
tainty in forecast, the MCMC method produces in effect a predictive distribution 
of the volatility of interest. The predictive distribution is more informative than a 
simple point forecast. It can be used, for instance, to obtain the quantiles needed 
in value at risk calculation. 


12.11 OTHER APPLICATIONS 


The MCMC method is applicable to many other financial problems. For example, 
Zhang, Russell, and Tsay (2008) use it to analyze information determinants of 
bid and ask quotes, McCulloch and Tsay (2001) use the method to estimate a 
hierarchical model for IBM transaction data, and Eraker (2001) and Elerian, Chib, 
and Shephard (2001) use it to estimate diffusion equations. The method is also 
useful in value at risk calculation because it provides a natural way to evaluate 
predictive distributions. The main question is not whether the methods can be used 
in most financial applications, but how efficient the methods can become. Only 
time and experience can provide an adequate answer to the question. 
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EXERCISES 


12.1. 


12.2. 


12.4. 


12:5. 


12.6. 


12.7: 


Suppose that x is normally distributed with mean jz and variance 4. Assume 
that the prior distribution of u is also normal with mean 0 and variance 25. 
What is the posterior distribution of u given the data point x? 

Consider the linear regression model with time series errors in Section 12.5. 
Assume that z; is an AR(p) process (i.e., zt = Q1Zr-1 +: + bpZt—p + 4). 
Let @ = (1, ..-, p) be the vector of AR parameters. Derive the condi- 
tional posterior distributions of f(B|Y, X, ¢, o°’), SOY, X, B, o”), and 
f(o7|Y¥, X, B, b) using the conjugate prior distributions, that is, the pri- 
ors are 


B~N(B,,2o), &~N(gb,,40),  (va)/o? ~ x. 


. Consider the linear AR(p) model in Section 12.6.1. Suppose that x, and 


Xn+1 are two missing values with a joint prior distribution being multivariate 
normal with mean w, and covariance matrix X,. Other prior distributions 
are the same as that in the text. What is the conditional posterior distribution 
of the two missing values? 

Consider the monthly log returns of Ford Motors stock from January 1965 
to December 2008: (a) Build a GARCH model for the series, (b) build 
a stochastic volatility model for the series, and (c) compare and discuss 
the two volatility models. The simple returns of the stock are in the file 
m-fsp6508.txt. 

Build a stochastic volatility model for the daily log return of Cisco Systems 
stock from January 2001 to December 2008. You may download the simple 
return of the stock from the CRSP database or the file d~-csco0108.txt. 
Transform the data into log returns in percentage. Use the model to obtain 
a predictive distribution for 1-step-ahead volatility forecast at the forecast 
origin December 31, 2008. Finally, use the predictive distribution to com- 
pute the value at risk of a long position worth $1 million with probability 
0.01 for the next trading day. 

Build a bivariate stochastic volatility model for the monthly log returns of 
Ford Motors stock and the S&P composite index for the sample period from 
January 1965 to December 2008. Discuss the relationship between the two 
volatility processes and compute the time-varying beta for the Ford stock. 
Consider the monthly log returns of Procter & Gamble stock and the value- 
weighted index from January 1965 to December 2008. The simple returns 
are given in the file m-pgvw6508.txt. Transform the data into log returns 
in percentages. (a) Build a bivariate stochastic volatility model for the two 
return series. (b) Build a BEKK(1,1) model for the two series. (c) Compare 
and discuss the two models. 


. Consider the monthly data of 30-year mortgage rate and the 3-month Trea- 


sury Bill rate of the secondary market from April 1971 to September 2009. 
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The data are in m-mort3mtb7109.txt. (a) Build a regression model with 
time series error to study the effect of 3-month Treasury Bill rate on the 
mortgage rate. (b) Reestimate the model using MCMC method. (c) Compare 
and discuss the two fitted models. 
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ACD model, 255 
exponential, 256 
generalized Gamma, 257 
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Weibull, 256 
Activation function, see Neural network, 
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Airline model, 84 
Akaike information criterion (AIC), 48, 
406 
Arbitrage, 443 
ARCH effect, 114 
ARCH model, 115 
estimation, 120 
t distribution, 121 
GED innovation, 122 
normal, 120 
Arranged autoregression, 212 
Augmented Dickey—Fuller test, 77 
Autocorrelation function (ACF), 31 
Autocovariance, 30 
Autoregressive integrated moving-average 
(ARIMA) model, 76 
Autoregressive model, 37 
estimation, 49 
forecasting, 54 
order, 47 
stationarity, 46 
Autoregressive moving-average (ARMA) 
model, 64 
forecasting, 68 


Back propagation 
neural network, 203 
Back-shift operator, 41 
Bartlett’s formula, 32 
Bayesian information criterion (BIC), 48 
Bid—ask bounce, 236 
Bid—ask spread, 236 
Bilinear model, 177 
Black—Scholes 
differential equation, 299 
Black-Scholes formula 
European call option, 301, 109 
European put option, 302 
Brownian motion, 290 
geometric, 294 
standard, 289 
Business cycle, 42 


Canonical correlation analysis, 437 
Characteristic equation, 46 
Characteristic root, 42, 46 
CHARMA model, 150 
Cholesky decomposition, 400, 458, 
517 

Cointegration, 91, 428 
Cointegration test 

maximum eigenvalue, 437 

trace, 437 
Common factor, 543 
Common trend, 430 
Companion matrix, 404 
Compounding, 4 
Conditional distribution, 8 
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Conditional forecast, 55 
Conditional heteroscedasticity, 97 
HAC covariance estimator, 98 
Conditional-likelihood method, 61 

Conjugate prior, see Distribution, 618 
Correlation 

coefficient, 30 

constant, 522 

time-varying, 525 
Cost-of-carry model, 443 
Covariance matrix, 390 
Cross-correlation matrix, 390, 391 
Cross-validation, 193 
CVaR 

Conditional value at risk, 328 


Data 

3M stock return, 21, 67, 74, 185 

Bank of America stock return, 485 

BHP daily price, 449 

Cisco stock return, 296, 538, 546 

Citi-Group stock return, 21 

Civilian employment number, 475 

Consumer price index, 475 

equal-weighted index, 21, 60, 61, 102, 178, 
213 

GE stock return, 663 

Hewlett-Packard stock return, 485 

Hong Kong market index, 507 

IBM stock return, 21, 33, 145, 154, 155, 182, 
203, 213, 295, 330, 334, 338, 340, 348, 
355, 364, 393, 485, 524, 544, 645 

IBM transactions, 238, 241, 246, 250, 262, 268 

Intel stock return, 21, 111, 123, 339, 485, 538, 
546 

J.P. Morgan Chase stock return, 485 

Japan market index, 507 

Johnson & Johnson’s earnings, 81 

Mark/dollar exchange rate, 116 

Microsoft sock return, 21 

SP 500 excess return, 134, 151 

SP 500 index futures, 443, 445 

SP 500 index return, 154, 158, 160, 393, 524, 
538, 544, 640, 645, 658 

SP 500 spot price, 445 

U.S. 3-month Treasury bill rate, 196 

U.S. government bond, 23, 395, 494 

USS. interest rate, 23, 90, 627, 635 

U.S. monthly unemployment rate, 181 

U.S. real GNP, 42, 188 

U.S. unemployment rate, 218 

Vale daily price, 449 

value-weighted index, 21, 145 

value-weighted index, 34, 47, 102, 213 
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Data augmentation, 615 
Decomposition model, 248 
Descriptive statistics, 21 
Diagonal VEC model, 510 
Dickey—Fuller test, 77 
Differencing 76 
seasonal, 84 
Distribution 
beta, 620 
double exponential, 312 
Frechet family, 344 
gamma, 277, 619 
generalized error, 122 
generalized extreme value, 343 
Generalized gamma, 278 
generalized Pareto, 361, 373 
inverted chi-squared, 621 
Laplacian, 312 
multivariate t, 548 
multivariate normal, 460, 619 
negative binomial, 620 
Poisson, 620 
posterior, 618 
prior, 618 
conjugate, 618 
skew-Student-t, 122 
Weibull, 278 
Diurnal pattern, 238 
Donsker’s theorem, 290 
Duration 
between trades, 239 
model, 253 
Durbin-Watson statistic, 97 
Dynamic conditional correlation model, 531 


EGARCH model, 143 
forecasting, 147 
Eigenvalue, 457 
Eigenvector, 457 
EM algorithm, 614 
Error correction model, 431 
Estimation 
extreme value parameter, 345 
Exact-likelihood method, 61 
Exceedance, 359 
Exceeding times, 359 
Excess return, 5 
Expected shortfall, 333 
Extended autocorrelation function, 66 
Extremal index, 377, 380 
Extreme value theory, 342 


Factor analysis, 489 
Factor mimicking portfolio, 482 
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Factor model 
common factor, 468 
estimation, 491 
factor loading, 468 
specific factor, 468 
Factor rotation 
varimax, 492 
Filtering, 561 
Forecast 
horizon, 54 
origin, 54 
Forecast updating formula, 581 
Forecasting 
MCMC method, 666 


Forward filtering and backward sampling, 655 


Fractional differencing, 101 


GARCH model, 131 

Cholesky decomposition, 528 

multivariate, 521 

diagonal, 523 

time-varying correlation, 526 
GARCH-M model, 142, 660 
Generalized least squares, 478 
Generalized Pareto Distribution, 361 
Geometric ergodicity, 179 
Gibbs sampling, 615 


Global minimum variance portfolio, 473 


Griddy Gibbs, 623 


Half-life, 56 

Hazard function, 279 
Hh function, 318 
Hill estimator, 348 
Hyperparameter, 625 


Identifiability, 422 
IGARCH model, 140, 329 
Implied volatility, 110 
Impulse response function, 71, 413 
Innovation, 36 
Inverted yield curve, 91 
Invertibility, 60, 431 
Invertible ARMA model, 70 
Ito process, 292 
Ito’s lemma, 294 
multivariate, 310 


Joint distribution function, 7 
Jump diffusion, 311 


Kalman filter, 563, 593 
Kalman gain, 563, 593 


Kernel, 190 
bandwidth, 192 
Epanechnikov, 191 
Gaussian, 191 

Kernel regression, 190 

Kurtosis, 9 
excess, 9 


Lag operator, 41 
Lead-—lag relationship, 391 
Leptokurtic, 9 
Leverage effect, 111, 144, 650 
Likelihood function, 19 
Linear time series, 36 
Liquidity, 235 
Ljung—Box statistic, 114, 32 
multivariate, 397 
Local linear regression, 195 
Local trend model, 558 
Log return, 5 
Logit model, 267 
Long position, 6 
Long-memory 
stochastic volatility, 154 
time series, 101 


Marginal distribution, 7 
Market model, 470 
Markov process, 613 
Markov property, 37 
Markov switching model, 187, 660 
Martingale difference, 132 
Maximum-likelihood estimate 
exact, 420 
MCMC method, 199 
Mean equation, 113 
Mean excess function, 362 
Mean excess plot, 362 
Mean residual life plot, 363 
Mean reversion, 56, 71 
half-life, 56 
Metropolis algorithm, 622 
Metropolis—Hasting algorithm, 623 
Missing value, 600, 628 
Model checking, 51 
Moment 
of a random variable, 8 
Moving-average model, 57 


Nadaraya—Watson estimator, 191 
Neural network, 199 
activation function, 200 
feed-forward, 200 
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Neural network (Continued) 

skip layer, 202 
Neuron, see Neural network, 200 
Node, see Neural network, 200 
Nonlinearity test, 205 

BDS, 208 

bispectral, 207 

F test, 211 

Keenan, 210 

RESET, 209 

Tar-F, 213 
Nonstationarity 

unit-root, 72 
Nonsynchronous trading, 232 
Nuisance parameter, 211 


Options 
American, 288 
at-the-money, 288 
European call, 109 
in-the-money, 288 
out-of-the-money, 288 
stock, 288 
strike price, 109, 288 
Order statistics, 339 
Ordered probit model, 245 
Orthogonal factor model, 489 
Outlier 
additive, 628 
detection, 632 
OX command 
garchOxFit, 130 


Pairs trading, 446 
Parametric bootstrap, 215 
Partial autoregressive function (PACF), 
46 
PCD model, 265 
Peaks over Thresholds, 359 
x weight, 70 
Pickands estimator, 348 
Platykurtic, 9 
Poisson process, 311 
inhomogeneous, 371 
intensity function, 363 
Portmanteau test 32, see Ljung—Box statistic, 
397 
Positive-definite matrix, 458 
Prediction, 561 
Present value, 5 
Principal component analysis, 483, 543 
w weight, 36 
Put—call parity, 302 


Quantile, 8 
definition, 327 


R command 
ar, 49 
exindex, 384 
factanal, 495 
garchFit, 127 
gev, 351 
glm, 252 
gpd, 367 
hill, 351 
lowess, 196 
meplot, 363 
nnet, 223 
optim, 186 
pot, 366 
princomp, 488 
qt, 335 
quantile, 341 
read.table, 24 
riskmeasures, 370 


ts, 24 
R package, 24 
R-square, 54 


Adjusted, 54 
Random coefficient (RCA) model, 153 
Random walk, 72 

with drift, 73 
Realized volatility, 559, 162 
Reduced—form model, 399 
Regression 

with time series errors, 90 
Return level, 358 

stress period, 358 
RiskMetrics, 328 


S-Plus command 
archTest, 115 
autocorTest, 115 
coint, 439 
ewmal, 331 
garch, 124 
mfactor, 500 
mgarch, 507 
OLS, 97 
tslag, 97 
VAR, 414 
VECM, 439 

Sample autocorrelation, 31 

Scree plot, 488 

Seasonal adjustment, 82 

Seasonal model, 81 
multiplicative, 84 
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Shape parameter 
of a distribution, 343 
Shock, 36, 55, 113 
Short position, 6 
Simple return, 3 
Skewness, 9 
Smoothed disturbance, 597 
Smoothing, 189, 561 
Square root of time rule, 330 
Standard Brownian motion, 77 
State-space model, 576 
nonlinear, 199 
Stationarity, 30 
weak, 390 
Statistical arbitrage, 446 
Steady state, 594 
Stochastic diffusion equation, 292 
Stochastic volatility model, 153, 637 
multivariate, 643 
Structural equation, 400 
Structural form, 400 
Structural Time Series, 590 
Structural time series model, 558 
Student-t distribution 
standardized, 121 
Survival function, 363 


Tail index, 343 

TGARCH model, 149 
general form, 182 

Threshold, 180 

Threshold autoregressive model 
multivariate, 444 
self-exciting, 180 
smooth, 184 
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Threshold cointegration, 444 
Time plot, 19 

Transactions data, 237 
Trend stationary model, 75 


Unit-root test, 76 
Unit-root time series, 72 
Unobserved component model, 590 


Value at risk, 326, 546 

VaR 
econometric approach, 333 
homogeneous Poisson process, 365 
inhomogeneous Poisson process, 371 
RiskMetrics, 329 
traditional extreme value, 353 

Vector AR model, 399 

Vector ARMA model, 422 
marginal models, 427 

Vector MA model, 417 

VIX Volatility Index, 110 

Volatility, 109 

Volatility equation, 113 

Volatility model 
factor, 543 

Volatility smile, 311 


Weighted least squares, 477 

White noise, 36 

Wiener process, 289 
generalized, 291 


Yule—Walker equation 
multivariate, 404 
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